As companies continue to expand their reliance on data in today’s world, planning for the possibility that something could go wrong has become more and more important. To make matters worse, the chances that something bad will happen are higher than ever because the risks are high with so many breaches and data loss issues caused by cloud misconfiguration, insider threats, and external bad actors.
In 2017, GitLab suffered a major data loss after a technician mistakenly destroyed 300GB of data from a production database, and was unable to restore it from company backups. GitLab was down for around 36 hours before it was able to retrieve that database. This was caused by a series of mistakes, such as using a wrong version of Postgres and storing backup in a slower database in a different region. As is the case with many companies, GitLab had suffered from lack of responsibility for taking care of the company’s backups[1].
The importance of properly securing and managing backups is indeed undeniable, but since backups are not considered a primary part of a company's daily operation, they frankly are sometimes not given the attention they need and deserve.
A few months before GitLab’s data loss incident, a huge email marketing organization called River City Media was breached. 1.37 billion users’ PII records (393 million unique email addresses) and internal company information were leaked. This is considered one of the world’s allegedly biggest data breaches of the last 10 years. The cause of this breach was due to a the company accidentally leaving the backup database accessible online – The company used the rsync protocol to backup its MySQL databases, but those backup servers were not password protected. This obvious yet inadequate treatment of backup led to an avoidable disaster[2].
When looking at securely backing up data in your clouds, there are a few risks and obstacles that you must take into consideration:
As common practice, many organizations are using data storage services such as AWS S3 buckets or Azure blobs for their backups, while their other operational data sits on fully managed database services such as Amazon RDS, DynamoDB, or Azure SQL. This results in many buckets or blobs containing a mix of backups and sensitive data. Further, while the main data facilities are well maintained and configured, no one is managing and securing those backups. Questions such as “Who has access to this storage?”, “Are those blobs staying in the right region?” and “Do those buckets encrypted as they should? '' commonly remain unanswered.
In 2019, by keeping their data in a publicly accessible S3 bucket, iPR Software exposed thousands of customers and sensitive admin records. This bucket contained backups generated from MongoDB, among other files that had been kept there. Most of the sensitive data that leaked (contact information for 477,000 users) was from these unmanaged and abandoned backups.[3]
While abandoned backups are usually known to a company even though they are typically not well maintained, there is also the high probability that backups exist without anyone's knowledge. In this case, the snapshots of datastores have already been deleted, but these orphaned snapshots are not deleted since they are not associated with a current snapshot policy. Orphaned snapshots can be created as a result of a few scenarios. For example, manual snapshots of an RDS or EC2 instance can be taken at any time, and they never expire, even after their backed-up instance has been deleted. Therefore, these snapshots will likely remain hidden from the company. Another example can be caused by making automated backups that are replicated to another region and automatically retained without a controlled deletion mechanism[4].
Data sovereignty is the idea that stored digital data is governed by the laws of the country/region in which it is located. When used in relation to cloud storage, this term describes the precise location of the cloud data center, where data sovereignty not only applies to production workloads and data but to backups as well. In each country, the data sovereignty laws are different, and some countries have weaker data protection laws that may expose your data to unknown risks and theft. Therefore, replicas that come from sensitive datastore and hosted in different region as the original instance could cause compliance issues.
Having an automated backup plan isn’t enough. Since backups involve complicated settings, even for businesses that purportedly maintain their backups in accordance with cloud adaptations for the 3-2-1 backup rule (3 copies of your data on 2 different media with one copy off-site[5]), certain backup failures may still occur due to misconfigurations. One of the most common mistakes is configuring the retention period incorrectly, without aligning to the needs of the business or fulfilling legal requirements. In this case, restoring the data wouldn't work as intended and would result in data loss.
Eureka’s Cloud Data Security Posture Management platform goes well beyond the common cloud security baselines, taking a proactive approach to informing security and compliance teams about existing and shadow risk, and suggesting mitigation approaches. By offering the capabilities required for full cloud data storage security and compliance without requiring deep expertise or manual overhead, Eureka ensures that risky misconfigurations are discovered, assessed and remediated using a single solution across any (or several) cloud providers.
[1] https://about.gitlab.com/blog/2017/02/10/postmortem-of-database-outage-of-january-31/
[2] https://fortune.com/2017/03/06/spammer-leaks-data/
[3] https://www.securityweek.com/thousands-ipr-software-users-exposed-amazon-s3-bucket
[4] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_DeleteInstance.html
[5] 3: Create one primary backup and two copies of your data. 2: Save your backups to two different types of media. 1: Keep at least one backup file offsite.
[6] https://docs.aws.amazon.com/aws-backup/latest/devguide/deleting-backups.html