Drive Secure Cloud Data Growth, Diminish the Risk. A Conversation on Cloud Data Security

Liat Hayun:

Hi, everyone. My name is Liat Hayun and I am the CEO and Co-Founder of Eureka Security, a company focused on helping organizations secure data in their clouds. I'm joined here today by Andy Ellis, a prior CSO at Akamai and an Operating Partner today at YL Ventures, getting to meet many security teams and a lot of cool security startups. What we're going to talk about today is cloud data security. Cloud data security has become a hot topic recently, right? Many organizations looking to better secure and better protect their data in the cloud. And what we thought we can do here today is to talk a bit about why it becomes such an urgent matter for many organizations we work with. Hey, Andy.

Andy Ellis:

Hey, Liat. How are you doing today?

Liat Hayun:

I'm good, thank you.

Andy Ellis:

I think we need to understand that data transformation, in fact, all cloud transformation, isn't something that just happens overnight. It's not just this process where you say, "Ah, I'm in the enterprise and I'm in the cloud." Organizations move gradually as they're headed to the cloud, and one of the things that tends to happen is, as they're moving, the scope gets bigger because things get easier. And all of a sudden you were moving one data store and now it's 20, you were moving 20 and now it's 200. And so I think there's a lot of companies that have consumed and maybe they started by consuming SaaS platforms. And then they moved to compute. And now it's data, and data moving as its own thing rather than just moving a database.

Liat Hayun:

Yep. And we're seeing that backed up by the data we have. So, SaaS applications like Microsoft 365 growing in numbers, taking off market share from their on-prem counterparts. So, for example, growing from what I have here, 57% to 67% during 2020 to 2021 only. And then we're seeing the similar trends in Atlassian and their SaaS-based product suite, or products like Slack, like GitHub and Salesforce. So definitely have seen this migration through SaaS applications. And then as for compute, cloud adoption statistics show us that already at 2020, 83% of companies had workloads stored in the cloud, running in the cloud.

Liat Hayun:

But for data, this is not what we're seeing today. Statistics show that only by 2025 we'll get to 50% of data being stored in the cloud. And of course, that also stems from a lot of the data already being on prem, so you kind of need to create new data in the cloud, but also because not all the data moves to the cloud in one bright day. So we're seeing, we're ahead of that curve or maybe even just in the peak of that transition into the cloud.

Andy Ellis:

Yeah. And I think the things people don't plan for is that data is so big. And when we think about enterprise processes, you have this one giant database that everybody protects access to because you don't want to overwhelm it with too many queries. And then you move it to the cloud and somebody says, "I want to run some tests on it," and it's almost like push a button, you have a second copy of the database. And then those databases can fragment as people add to one and not to another. And so I think the model says, oh, we'll just move our database to the cloud and it'll be just like it was in the enterprise where we had one authoritative data store and that's where all of our data was. I don't think that's going to last, I think most companies as they move will discover how easy it is to replicate data, for good and for ill.

Liat Hayun:

What you're saying, what I'm hearing you saying, is there are three amplifying factors you talked about. So we have data being stored in the cloud. It's now easier to store it, it's easier to add more data to it.

Andy Ellis:

Yep.

Liat Hayun:

That opens up more use cases, as you talked about, so more needs to replicate that data, whether if it's even using the same technology or now storing it in different technology. And that feeds back, or that adds more teams, more people in the organization or outside of it wanting to use that data, which then leads to more data being stored, more use cases, more teams and so on. So it's like an amplification process that happens throughout these three factors, leading to more data being stored and more people using it in the cloud, and basically more to protect, more footprint to protect them.

Andy Ellis:

Exactly. And you have to protect it across multiple clouds. I think HashiCorp just released that 76% of companies are already multi-cloud, and I think it's important that we understand what multi-cloud means. I think when we first started talking about multi-cloud 10 years ago, we talked about building in an agnostic fashion so that every architecture you had was on all of your clouds. And what I think's really happened is different parts of your organization move to different clouds.

Andy Ellis:

So maybe your first deployment is in AWS, and then you're building some new system and you're an active directory shop and Microsoft gives you a great deal to build that in Azure. So you're multi-cloud, but it's not the same architecture across each cloud, but your data needs to now become available in those different cloud infrastructures. And so whatever you built in one cloud, you have to write how to replicate it into another cloud because you didn't build it in an agnostic fashion to just move it.

Liat Hayun:

And it's not just agnostic, the way you build your architectures, you talked about different teams and those different needs. So you have your data science team that prefers one technology, you have your marketing team that interacts or need to connect their data into a specific platform, that requires a different technology. So, by opening up the data for different teams and different use cases, you actually need a different tech stack that then requires you to learn and understand how to protect that data within that tech stack, which just increases the challenge.

Andy Ellis:

Right. And you have to accommodate, like you talked about, data scientists. And I think most people don't really understand that that's where the business value is. And so, the data scientists have often been limited by, oh, we have one database, we have to protect queries against it. So you only get to run production queries. So how did you get to production? And so to protect the data, you really do need to understand, how will your data scientists develop new queries? How are they going to access that? And the real answer in the cloud is they're going to copy the data store and they're going to run test queries against production data, but not against the production data store.

Liat Hayun:

And it's funny, we're talking about data science here. Five years ago, data science was something that only the Googles of the world were able to do. Now, every startup you meet here at YL is actually doing some data science. So it's a very common use case for organizations today.

Andy Ellis:

Yeah. It's one of those fascinating things that I think, if you go back a few years, nobody thought they would have to be data scientists, that would be this weird, special discipline. But now every security team has data scientists.

Liat Hayun:

^So what's the problem? We talked about the trends, we talked about data moving to new locations, new technologies being stored in higher volumes, higher sensitivity. What's the problem with that? What's the risk issue associated with it?

Andy Ellis:

So the biggest problem is security teams often end up playing and catch up on any technology, this is not unique to data. But you're trying to figure out what the business is doing so you can give them advice about what to do correctly. Data has a lot of security and compliance requirements. If it's about EU citizens, you can't take it out of the EU. If it's credit card numbers, you have to encrypt the credit card numbers, there's a whole host of PCI requirements. And most security teams, they might have a list of things to do, and then as they find a system, they say, "Oh, by the way, here's the list you have to go do." But then they have to keep track of which systems they've talked to and what their progress is.

Andy Ellis:

Well, when you had one enterprise database that was easy, you had one data organization to talk to. Now you go talk to that organization and by the time you come back with your list, maybe they've replicated it. So you have your spreadsheet in which you're keeping track of all of your assets and all of your inventory and what steps they've currently taken and which ones still have to happen. Are they ready for your next audit? Have you collected evidence? And all of that tends to be very manual processes. An individual task might seem very easy and lightweight, 'oh, I just run this command to extract a configuration'. But the actual overall is so much human labor to just keep track of what's going on in your organization.

Liat Hayun:

So you described many very challenging steps throughout that process. Let's start by trying to list those and zoom into each of these. So you've talked about that Excel spreadsheet, or being able to just look at the data stores you have, where do you have data residing? With organizations you've been talking to, do you find that challenging? Do you find that something that is easy to achieve? What is the status today?

Andy Ellis:

So I think the biggest challenge often is the mindset of an executive, not even just the security executive, but who you're responsible to, is to say, "Well, you know where the most important thing is so go secure that first." So you often skip past even collecting an inventory because your inventory is what you assume you know. They say, "Well, I know that we have this customer relations database so of course we're going to secure that." And then you don't stop and ask, "But where did we copy it to? Who else has a copy of that data?" So that's the first challenge, is just stopping and saying, how do I continuously collect an inventory of where the data has moved to since the last time I looked?

Liat Hayun:

So you talked before about organizations moving to the cloud with their 20 on-prem databases. Now, all of a sudden that amplification mechanism or process causes them to have 200 data stores. Not just giving out that number, knowing where those data stores are, that's a challenge on its own, understanding what they need to protect, gaining that visibility. What's next? So assuming they know where all their data is, what do they need to do about this?

Andy Ellis:

Well, now they need to understand what is in each of those data stores. Just knowing where you have databases isn't enough, now what's in them? Do you have ones that are true PII and ones that are synthetic? Hopefully, your testing environments don't use production data, although I suspect that almost every organization is not doing that correctly. So it can't just be simply, 'oh, it has a username field', but that's a good start. So now you have to understand what are the requirements on each data store based on what's in it and how it's being used, because that's then going to let you move to the step after, which is actually securing each one of those data stores.

Liat Hayun:

So when we talk about visibility, it sounds like we often hear a very shallow definition of visibility, which is where my data is. And what you're describing here is a much deeper understanding of that need, which is not just where data is, what it contains, whether or not this data is actual production, actual customer data or not. And then maybe even the risk associated with it, how critical it is for us to protect it in what ways. Is that correct? Is that accurate?

Andy Ellis:

Yeah.

Liat Hayun:

And then, so what are the challenges in protecting that data today, understanding the risk associated with that?

Andy Ellis:

So, one challenge is sometimes the protections are incompatible with how the data's being used today. We may say, "Oh, we need to encrypt certain data at rest." What does even 'at rest' mean? That meant one thing in the enterprise world, where we said "Oh, when the server shut down, all the data was encrypted but as soon as it started up, we decrypted the data." Well, in the cloud, there's no such thing as shut down. So does that mean there is no longer a data at rest requirement? Or does it mean you actually have to encrypt the data inside the database?

Andy Ellis:

And if so, that's a substantial change to how the business operates because it can't run queries directly on the data, it has to run it through something that's decrypting the data. So it's an example of a challenge. You can't just say, look, here are my 55 requirements. I'm going to implement them all today. You have to sometimes negotiate with the business. Who needs access to this data? If it's a data set that you're not going to make available in one country or another, how are you making sure that you're doing that and rolling it out in a way that security is partnering with the business to enable them, rather than just trying to stop the business from doing its job?

Liat Hayun:

So it sounds like up until now we talked about two sort of parallel processes. We talked about the need to understand where data is and what it contains, and then the need to understand the organizational policies or how data needs to be managed, accessed, controlled. How do you then combine between those two? You know how data needs to be controlled, you know who should have access to it, but then you need to go and implement that across the different technologies and multiple data stores that we've talked about before.

Andy Ellis:

Right. And that, different technologies, is a huge problem wrapped up in a very tiny phrase. Anybody who's worked with MySQL probably understands deeply how to implement a bunch of these things. If you take that same person and try to go do that in Oracle... And I'm still here on mostly enterprise technologies, I haven't even gotten to cloud. But Oracle uses completely different commands, completely different capabilities, that interact with Oracle's operations in ways that would surprise that MySQL administrator. But now you're going to take this to the cloud and use cloud-native data stores. And so you might say, 'well, what I need to do is restrict access to this column, to this set of identities'. Doing that across multiple technologies often requires you to develop an expertise in each one of those technologies before you can get it right, because the naive approach that you would apply will sometimes leave you with very large holes, because the same command means very different things in different technologies.

Liat Hayun:

And that is exacerbated by everything we talked about before: the different teams, different use cases, different policies that are associated with that, and then different technologies and different needs to translate that. And what we've been seeing for many organizations, as you said, is either the security team then needs to go and learn that new technology and understand how to control it better, or in many organizations that you become a gating mechanism, not allowing to introduce new technologies and then slowing down the business, not being able to use the best data storage technology for that specific use case. There are so many reasons to use a specific data storage technology: performance and cost and functionalities of it. Security needs to be an enabler, allowing to use technology within specific guardrails rather than prohibiting from using a technology because it doesn't comply or work in the same ways that the security team is used to. What have you seen? How have you seen organizations deal with that challenge, that balance?

Andy Ellis:

Well, the first way organizations often deal with that is if the security team tries to ban and prohibit technologies, they get worked around. You're not going to say, "Oh, we're not going to deploy the best database for this," just because the security team doesn't understand it. And they've said, "Well, you need to give us a head count to get an architect to get up to speed on this one technology," because you recognize then they'll do that for every technology. So you spend time talking to them and then you find an emergency reason to roll out your new data store and now the security team's playing catch up. Or they said, "Oh, we said you weren't allowed to do that," but there's a lot of money being made and the business will ignore you. So it's a very dangerous position for a security team to be in, to try to gate a new technology that is about revenue, because the revenue will always win.

Andy Ellis:

So that's one avenue. The other avenue is the security team say, "Well, I guess, roll it out and it's on you." And they push over to the developers and say, "You have to implement all the security technologies." But the developers, let's be honest, that's not what they're paid to do. They're not paid to understand the details of least privilege. This is not to say they don't care about security, but it's not their specialty. So what we really need is to be able to synthesize between the security requirements and the technology specifics. Most organizations don't have a good way to do that at scale.

Liat Hayun:

Yeah. Especially given we're all very much aware of the talent gap we have in the security industry. So just adding one more manual task that doesn't have... It's not specific to an organization. You don't have your own added value for why you need to do that, as opposed to now leveraging something that can help you translate that, save errors, make that more automated, make it a lot quicker. So introducing these functionalities is something that we've seen across different aspects of cyber security, like automation and introducing other functionalities. And we're now seeing similar trends towards data security, as it now becomes a major area of focus for many organizations.

Andy Ellis:

Right. And security's not the only place we've seen that. You see that across performance technologies as well. You don't try to hire performance engineers for every single technology within your tech stack. You might have some that scale, but you do want to have sort of standard tools that help you do performance measurement and monitoring and optimization.

Liat Hayun:

Yeah. And while this problem is probably applicable for almost every organization using the cloud, we do see, there are, I think, three use cases where this problem is even more amplified, even more critical. The first one is very fast-growing organizations where security is sometimes an afterthought, because, again, you're focused on the business. You're a hundred people unicorn, and all you care about is revenue. Obviously, you kind of put security aside and you need to play that catch up we talked about. Organizations, even big ones, that have transitioned to the cloud very abruptly, you could say, or very quickly, have seen these trends as well. And then maybe even M&A processes, which are a plenitude these days. These are examples to when these use cases, this need to very quickly protect different types of technologies, is becoming then much more critical.

Andy Ellis:

Absolutely.

Liat Hayun:

So let's say that I've kicked off my data protection project. I've mapped out all the data inventory. I know this sounds very imaginative, but let's say I was able to somehow map out all the data stores locations I have, understand the data in them, using Excel spreadsheets and questionnaires and other manual processes. I've dedicated my entire team to it. I've reviewed the policies and controls, and it's now okay, it's now fine. Everything is up to speed, everything is complying with the policies that I've decided on in my organization. Why not leave it there? Why isn't that enough?

Andy Ellis:

Well, first of all, data isn't static. So you go and you do that inventory. The day you finish the inventory, it's now out of date, because that first system you interviewed and they said, "Oh yes, we have three data stores," well, by the time you finish interviewing everybody else in your organization, they have seven data stores. So you have to keep coming back to understand, where has my data moved to? What data have I added? Hopefully, you're a growth business. You have new engineering teams with new data stores that didn't even exist in concept when you first started this project, because this project you just described, that's actually like a year-long effort by your whole team.

Andy Ellis:

Unfortunately, you had work you probably should have been doing as well, so maybe you have to clean up some of those messes. But the requirements are changing. It feels like every day, there's a new regulation that says, "Here's what you have to do with data that matches this exact framework." And so now you have to be able to go back and understand how do the controls that you implemented match to these new requirements? Are they sufficient? What do you have to do differently?

Liat Hayun:

With choosing a name like Eureka, you can assume my fondness of physics. And it sounds like, in physics, there is this concept called entropy, which means that a system without applying force to it, without continuously monitoring, basically tends to move into chaos. So things shift, things move, and this is exactly what you're talking about with data. It's not static. If you're not applying some processes to it, some forces to it, then it shifts and moves and moves back into the same chaotic state that was before you've invested all the time in making sure that things are in order. So that exactly is what we had in mind. So, let's take it to towards the solution. What would be a good approach? What would be an ideal solution for addressing this problem?

Andy Ellis:

So what I think you'd need is, like you said, that external force, and it doesn't have to be a huge force. And it's a combination of detective and preventative controls. You want to be able to notice that drift happening. We sometimes talk about configuration drift. Do you have systems that are no longer configured the way they should have been, and that you thought they were? That can always be very terrifying, when as a security professional, you come back sometimes four years later and you're like, "Oh, I know how this system works." And the people who work on it, none of whom have been there for more than two years, look at you and they're like, "It doesn't work like that at all." And you have no idea when those controls went away. At some point, somebody said, "Oh, we have an incident. Let's turn off this control. That seems to have worked." They don't turn them back on.

Andy Ellis:

So you want to be able to detect those shifts, but you also want to be able to prevent some changes. You want to be able to notice that somebody has just turned off access control and say, "No, you're not allowed to actually do that without humans being involved." So you need that to do it at scale, to know exactly what you want to have and how well you're doing at achieving it. And it doesn't need to just be a list of controls, right, here's the things to implement, it needs to understand the context for those controls. So that you could say, "Oh, this database looked great, it was fantastic." And then we just added something that contaminates it, that makes it more interesting under some control regime and now we have to alter those rules. And so that's a really important thing to notice is when the underlying data itself changes as well.

Liat Hayun:

Yeah. And we're talking a lot about the difference between policy and controls, what you want to have and how you achieve that, how you implement that. And it sounds like from your perspective, the controls are a means to the end, but really what you want to get to is a policy to make sure that everything works as expected.

Andy Ellis:

Right. A policy, think of is a narrative statement, which says something like, "We will never allow EU citizen data to be viewed outside the EU." That's just a policy. Now there's the set of controls. Do you have access control? Do you know where your EU citizen data is? Maybe you're masking EU citizen data so that it can be used by data scientists outside the EU. So those are each different controls, and you have to ask, do we actually meet the control objectives in the policy statement we made about it?

Liat Hayun:

That's awesome. So maybe a few final words. Where do you expect the space of data protection to be in three years' time? Which would you expect to be the new challenges or the challenges we were able to resolve?

Andy Ellis:

Yeah, I think we just laid out a lot of challenges. And it's interesting, because I have a talk I've been giving about how do you measure your security program? And I put down sort of tentative measures of what might work for vulnerability management, and for data protection I just put question marks. Because what I'm really hoping is that the next three years will actually tell us how do we measure the state of protection across an enterprise? Not just a data store, but how would we be able to say to our board that our data is protected? Because I think right now the biggest challenge we have is we can't as enterprises actually simply assert how well-protected our data is.

Liat Hayun:

That's a good challenge for us to focus on. Thank you so much, Andy, for being here today.

Andy Ellis:

Thanks for having me, Liat.

Subscribe for updates

Download Eureka solution brief

Drive secure & compliant data growth