Meet Macie … Your Cloud Bucket Protector

If you haven’t looked recently, Amazon AWS provides virtually every service that you can think of in system architectures:

But one weak area is the usage of S3 buckets, which are simple data stores for objects (/files):

Unfortunately, Amazon S3 data buckets are responsible for around 7% of all data breaches, especially through sloppy sharing, as users often dump sensitive information into them without encryption or access control. Often the buckets can be used to transfer large amounts of data between trusted partners. Users then forget that they can be easily found using tools such as Shodan. Often, too, the data buckets are mapped to internal drives, and users do not see that their shares are then exposed to the world.

Bitstream Bucket is one of the newest Python-baed tools for scanning, and takes its information from certificate transparency logs (from certstream) and then rebuilds possible names for public S3 buckets from permutations of the certificates domain name:

Meet Macie

We can see from the above graphic that the tool has found three buckets with read/write access. So how do we protect the buckets? Well, on Amazon AWS, there is a service named Amazon Macie and which uses machine learning to automatically discover, classify, and protect sensitive data. Macie understands what personally identifiable information (PII) looks like and then alerts users as to possible areas for leakage. It can also identity key elements of a company’s IP, and will alert users of anomalies in data access. Currently it is focused on S3 buckets (as this is the largest area of weakness in the AWS infrastructure), but it is likely that the service will be scaled across all the Amazon sourced data sources.

Macie continually scans S3 buckets looking for sensitive information, including access key headers, AWS access keys, AWS secret emails, API keys, email messages, and confidential marking on documents:

It also identifies anomalous behaviour on the files, such as where there is a console login to the buckets or where there’s a deletion of the audit trail (which is common in insider threat cases):

And to allow companies to catch poor security practice there is a dashboard which alerts administrators on red flag issues and warnings (such as having read access for the buckets):

Virginia Election Data and S3 buckets

On 18 July 2018, Bob Diachenko, head of communications at Kromtech Security, announced that his company had indexed 48,623 S3 buckets of Robocent (and which is a Virginia-based political campaign company). The buckets were self-timed with company who hosted them — Robocent. Data included pre-recorded political messages for robocalls dials (audio files), and voter data (CSV and XLS files), and included voter names, phone numbers, addresses, age, gender, jurisdiction breakdown and political affiliation.

The Pentagon and S3 buckets

Now the Pentagon leaked US Department of Defense (DoD) databases with over 1.8 billion social media posts of individuals around the world, and which cover an eight-year period. The buckets, discovered by UpGuard, were named: “ -backup,” “ -archive,” and “ -archive”. From 1 October 2017, the databases have now been secured.

In June 2017, the Pentagon suffered a breach of 28GB from unsecured S3 buckets. for highly secretive National Geospatial-Intelligence Agency (NGA) data. In the most recent data breach, the databases were scrapped from public sources, such as from news sites, comment sites, web forums, and social media (including Facebook). While the data was already publicly available, it highlights the type of surveillance content that the US Government were interested in. The languages scrapped include English, Arabic and Farsi (Western Persian) language content. A serious worry about the leakage is that less stable foreign states could use some of the data collected in order to pinpoint people who have spoken out about their regime.

NICE and S3

In June 2017, over 14 million Verizon customers were breached because of an S3-hosted cloud-hosted repository operated by NICE (an Israeli security company). It contained terabytes of data on Verizon’s customer data, including credit card details and PIN numbers. The data was contained in folders with ZIPs files and were named “Jan-2017” to “June-2017”. Included in the files were voice recognition log files from customer calls — and included an embarrassing “ ” field. The data was found by Chris Vickey — a well-known scanner of the S3 infrastructure — when his team found a subdomain named “ — ”.

Hacking an election through S3

Governments around the world, though, do not have a great track record in keeping voting data secure, especially where there are third parties involved. On 7 Apr 2016, it was discovered that the votes of 55 million Philippine voters had been leaked from the COMELEC (Philippines’ Commission on Elections) website. It was suspected that the site was hacked by Anonymous Philippines and within days the data was posted by LulzSec Pilipinas.

Using Shodan, and in 2015, Chris Vickery found voter registration records for over 191 million US citizens and where a database of over 300GB could be accessed over the Internet without a password. Vickey, at the time, also found over 13 million customer records related to the MacKeeper MongoDB database.

In 2016, Vickery then found that over 93 million Mexican voters exposed online because of a configuration error on the MongoDB database (and where no password was required to access the data). The database included voter names, addresses, ID numbers, dates of birth, parents’ names, and occupations.

So, one of the largest data an election has exposed over 198 Million United States votes (more than 60% of the US population). It happened when a company named Deep Root Analytics put 198 million votes on Amazon S3 storage, and without any restrictions on its access. Deep Root Analytics was employed by the Republican National Committee and was paid nearly a million dollars between January 2015 and November 2016, in order to provide election analytics.

The data breach was uncovered by Chris Vickery (from UpGuard and who previously found over 191 million US voter records in 2015 and 93 million Mexican votes) and who found terabytes of files which did not need a password to access them.

The details included first and last name, date of birth, phone number, home and mailing address, party affiliation, voter registration data, and ethnicity:

Hacking Teddy through S3

Recently a smart teddy caused a led to the leak of over 800,000 user account details, and where the hackers tried to lock the system until a ransom is paid. We also hear that it would be possible for a hacker to inject some code into the bear and act as a spy. This leaves over 800,000 customers worrying about their details, and whether the two million recorded conversations could be leaked onto the Internet. For this Spiral Toys, the creators of CloudPets, exposed their complete database to the Internet, and without a firewall or any password protection. researchers have said that their Mongo database was easily discoverable through Shodan.

Some on the dark web since viewed the database and it is said to contain 821,396 registered users, 371,970 friend records (profile/ email) and 2,182,337 voice messages. The voice messages themselves are not in the database, but Troy Hunt found them on the Amazon S3 Cloud, and where no authentication is required to access them.

The hashed passwords on the database for CloudPets were protected with a strong hash function (Bcrypt), but often the passwords were so weak, that it was fairly easy to crack them. There have also been signs of the continually scanning for the exposed databases, and that it has been completely deleted twice by hackers (and since recovered). Many used the password of “123456” and there are even signs that three character passwords were hashed.


In order to avoid a hack, you should try and randomise the names of your buckets, and set the right access control and auditing on the buckets. You also need to know what is sensitive information (including information on third parties), and don’t just think that automated tools will find this. Finally, you might want to switch on the Macie service, and continually monitor the usage of your buckets.

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. EU Citizen. Living by the sea. Old World Breaker. New World Creator.