Orca Security, AWS, and the Killer Whale of a Problem

Last week Orca Security published two critical vulnerabilities in AWS. This led to a bit of a hair-on-fire day, since AWS didn’t get around to saying anything formally about it until later that afternoon.

The particularly eye-popping phrase that stood out from one of the announcements was:

“Our research team believes, given the data found on the host (including credentials and data involving internal endpoints), that an attacker could abuse this vulnerability to bypass tenant boundaries, giving them privileged access to any resource in AWS.”

When a few AWS employees rightfully took exception to such a bombastic claim on Twitter, Orca’s CTO walked it back, though as of this writing the statement remains in the blog post.

I’m not thrilled with Orca for overstating the severity of the vulnerabilities, nor am I particularly pleased with AWS for its “we aren’t going to talk about these things unless forced to” mindset. I’m also fully aware that this came out right after I tore Azure a new one for similar security lapses; let it never be said that I’m not willing to hold AWS to at least the same standard!

What Happened

Orca Security posted two blog posts about two distinct vulnerabilities. The first is an ability to gain control plane access to a CloudFormation host and retrieve its AWS credentials, while the second is cross-account access via AWS Glue. Both of these are serious issues and were fully mitigated within days of Orca reporting them to AWS (which happened in September 2021).

What Orca Security Got Right

I want to start out by noting that Orca Security did the responsible thing by looping in AWS’s security team immediately, and not saying anything publicly until well after the vulnerabilities had been patched, in line with the guidelines of coordinated disclosure. (Just as an aside, I gave a talk at re:Invent 2019 on AWS vulnerability disclosure and response.) They’re also not using this to hurl anti-cloud nonsense at the industry; they maintain that you should still absolutely be moving workloads to the cloud from a security standpoint.

What Orca Security Got Wrong

I have two issues with Orca’s approach to talking about these vulnerabilities.

First, it’s unclear why Orca would wait over four months from discovering these vulnerabilities, then disclose them both simultaneously, and make overstated claims about the potential breadth of these exploits.

By disclosing these issues four months after the fact, you’d think that they would have had time to workshop the messaging so that it didn’t confuse readers about which exploit empowered what, as well as coordinate a messaging campaign with AWS so customers weren’t left in full panic mode for half a day. I’d have said that they spent that time waiting to disclose in coordination with AWS except that AWS didn’t bother to say anything formally when Orca made their announcement.

Second, Orca’s claim that they believe these vulnerabilities granted them access to all resources within AWS is undermined in the next paragraph by their note that they could not in fact make it do that. This is just irresponsible messaging. Worse, it undercuts the broader message that they were attempting to convey: that they’re responsible grown-ups with a deep understanding of how cloud security works, and that they’re to be trusted when they say such things.

What AWS Got Right

AWS didn’t (to my understanding) attempt to pressure Orca Security into not talking about the vulnerabilities in public. In fact, there’s a statement in one of the Orca posts from AWS Principal Engineer Anthony Virtuoso that lends credence to these issues being in the hands of Serious Professionals. Further, both of AWS’s security bulletins go out of their way to thank Orca Security by name for reporting the issues.

The fact that AWS put up those announcements is also a credit to them, albeit many hours after they should have been released (see above about the lack of coordinated disclosure!).

Perhaps the most interesting and borderline awe-inspiring lines from AWS’s tersely-worded bulletins is this one:

“Analysis of logs going back to the launch of the service have been conducted and we have conclusively determined that the only activity associated with this issue was between accounts owned by the researcher.”

In other words, AWS looked through the access logs for AWS Glue going back to the launch of the service five years ago to validate that this had never been exploited against customers.

That is a STUPENDOUS amount of data that they’re retaining. This is a far cry from Azure’s “our investigation surfaced no unauthorized access to customer data.”

What AWS Got Wrong

Speaking to the substance of the disclosures, while there are few details in Orca’s post, the incident is very much on par with Azure’s ChaosDB issue with regard to severity. Orca was able to assume the Glue service role in all customer accounts and would be able to have the privileges granted to that role within those accounts. In many cases, that’s going to be all data stored in S3. At least ChaosDB had the decency to be limited to data stored within CosmosDB, making this is a terrifying lapse on AWS’s part.

The CloudFormation issue also gave some chilling insight. First, that despite its semi-recent launch to defend against precisely this type of issue in the wake of Capital One’s breach, the Instance Metadata Service v2 isn’t being used in a “do as we say, not as we do” demonstration. Second, Orca was able to dump the contents of /etc/passwd which in turn showed a lot of AWS employee accounts deployed to the server which… yikes. This is exactly the kind of thing that Azure was rightfully dragged over; the handwaving and partial excuses on Twitter by AWS employees don’t refute that. This demonstrates that some of what AWS says or implies that it does may not be what’s actually occurring.

What AWS Got Very Wrong

All of this brings us to what AWS has gotten catastrophically wrong, and as you might expect, I have some thoughts.

Let’s start with the issue announcement itself.

First, I think it’s safe to say that AWS seemed surprised by this disclosure. I’d have expected a blog post on the AWS Security Blog at an absolute minimum that gave a lot more context and color to this.

Instead, I found out from Scott Piper’s tweet that it wasn’t going to be a typical Thursday. I mentioned at the start that a number of AWS employees took exception to Scott’s framing of the issue, to which the kindest and most sincere thing I can say is: sit the hell down. AWS had four months and an 800+ person PR team to come up with messaging around this, but instead all of us customers got to find out about this from a third party?

Orca offered up limited information about the scope and scale of the issue but included actual quotes from one of AWS’s principal engineers! Combined with the fact that Orca is an AWS Partner, that strongly suggests that their post received review by AWS–and yet AWS didn’t say anything?

This response shines a bright light on what is, according to one former AWS employee, a policy of “if there isn’t direct customer impact, we don’t disclose issues.” That makes it much more likely that there have indeed been hypervisor escapes and cross-account access by attackers previously. While the affected customer(s) would have been notified, such an event would have been kept quiet if at all possible. I didn’t expect that and it radically shifts my perspective on the levels of thoroughness and transparency we’re really getting from AWS’s security processes.

As a cloud provider, AWS owes its customers a realistic assessment of what the risks really are. “Fifteen years of no control plane access or cross-account lapses in the AWS security model” has been the strong implication, if never expressly said outright. We now know that that’s very much not true. With everyone now having watched AWS’s response to Orca’s disclosure, the open question is now this: how many times has this happened before, but been swept under the rug?

The Takeaway

Cloud security depends on being able to trust our cloud vendors to do what they say they’re doing and to communicate quickly and clearly whenever a vulnerability or breach has occurred.

I don’t accept having to hear it from third parties and only a short begrudging paragraph or two from the cloud provider hours later as being anywhere close to sufficient. How are customers supposed to trust a cloud provider if they don’t say anything until someone external shows the world what’s been going on? This extends well past the boundary of “security” and into everything that makes the cloud a viable option for any business that takes itself even halfway seriously.

As I said last week, I can’t fathom a scenario in which Google or AWS suffered from vulnerabilities like this and didn’t make loud, sweeping reforms that they talk about constantly just to rebuild the trust that they would have burned through.

AWS has just had a vulnerability of the very kind I was referring to come home to roost. Is my expectation as a customer around the communication and next steps from AWS leadership accurate, or am I a naive dreamer who had an overly rosy view of the company I cover the most?

Either way, we’re about to find out.

Orca Security, AWS, and the Killer Whale of a Problem

What Happened

What Orca Security Got Right

What Orca Security Got Wrong

What AWS Got Right

What AWS Got Wrong

What AWS Got Very Wrong

The Takeaway

More Posts from Corey

AWS Certificate Manager Has Announced Exportable TLS Certificates, and I’m Mostly Okay With It

A Day in the Life of Server #47B-2: An AWS Data Center Memoir

Cloud Repatriation is Getting Complicated

Get the newsletter!