Episode Show Notes & Transcript
Corey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.
nOps will help you reduce AWS costs 15 to 50 percent, if you do what tells you. But some people do. For example, watch their webcast, how Uber reduced AWS costs 15 percent in 30 days; that is six figures in 30 days. Rather than a thing you might do, this is something that they actually did. Take a look at it. It's designed for DevOps teams. nOps helps quickly discover the root causes of cost, and correlate that with infrastructure changes. Try it free for 30 days, go to nops.io/snark. That's N-O-P-S dot I-O, slash snark.
Corey: Welcome to the AWS Morning Brief: Whiteboard Confessional. Today we're going to tell a story that only happened a couple of weeks ago. We don't usually get to tell stories about what we do in the AWS bill fixing realm because companies are understandably relatively reticent to talk about this stuff in public. They believe, rightly or wrongly, that it will annoy Amazon which, frankly, is one of my core competencies. They think that it shows improper attention to detail to their investors and others.
I don't see it that way, but I found a story that we can actually talk about today in a bit more depth and detail than we normally would. So, we get a phone call about three weeks ago. Someone has a low five-figure bill every month on AWS. That's generally not large enough for us to devote a consulting project to, because frankly the ROI would take far too long. It's too small of a bill to make an engagement with us cost-effective, but we had a few minutes to kill and thought, “Eh, go ahead and pull up your bill. We'll take a quick look at it here.” Half of their bill historically—and growing—had been data transfer which, okay, that's interesting. And as we've known from previous episodes, data transfers is always strange. So, they looked at this and they said, “Okay, well, obviously then instead of serving things directly, we should get a CDN.”
So, then instead of getting a CDN, they chose to set up CloudFront, which basically is a CDN only worse in every way. And they saw no impact to their bill after a month of this. Okay, let's change that up a bit. Now, instead of CloudFront. We're going to move to an actual CDN. So, they did, there was a small impact to their bill. Okay, and costs continue to rise and what's going on? At this point in the story, they call us, which generally if you're seeing something strange on your bill, is not a terrible direction to go in. We see a lot of these things and if we can help you, we will point you towards someone who can. So, our consensus on this was, great. It is too small to look at the bill.
But let's pop into Cost Explorer and see what's going on. We break it down by service and S3 was the biggest driver of spend. Now, that's interesting. Number two was EC2. But okay, we start with the big numbers and work our way down. This is apparently novel for folks doing in-depth bill analysis, but we're going to go with it anyway. We start taking a look within that S3 category of usage type, and lo and behold, data transfer out to the internet is driving almost all of it. The cost per request is super low. That tells us in turn that—because we've seen a lot of these—that there are large objects, but relatively few requests for them.
So, all right, we're going to slice and dice slightly differently within Cost Explorer, AWS’s free—with an asterisk next to it—tool for exploring various aspects of your bill. That asterisk, incidentally, means that if you're doing this via API, it is one cent per call. If you're doing this in the console, it's free. Be aware, that can catch you by surprise if you write a lot of very chatty scripts. You have been warned. So, yeah, most of the spend was indeed on GetObject calls. So, okay, we know that data transfer spend was coming from an S3 bucket that was not going to a CDN. Otherwise, it's going to show up as a different data transfer charge in a different section of their bill.
Okay, so now we know it's S3. We have to figure out what bucket it lives within. And this is an obnoxious process, and we tell them this. And they’re like, “Oh, yeah, we know what bucket that is.” This, incidentally, is where almost every software tool tends to fall down. We could spend some time tracking this down programmatically. Or we can just ask someone who already has the context of what their business does and how it works loaded into their head. Because otherwise, what we'd have to do is tag all their buckets, and then wait a few days for that tag to percolate into the billing system, and then query it again because the visibility into this sort of thing is terrible. It's a shortcoming of both Cost Explorer and S3 in that weird seam between the two. There's fundamentally no easy way to see at a glance which buckets are costing you money unless you do something fun with tagging.
To that end, I'm a big believer in having every bucket tagged with a bucket name option. So, I can start slicing on that, and then you enable that as a cost allocation tag. So, great. Now, what is it that's getting requested? Well, you can also dig into this via access logs once you have the bucket, to see what's going on. Great. Now, we take a look at this, and sure enough—well, before I go into what it actually was, let's pause here for a moment.
This episode is sponsored in part by N2WS. You know what you care about? Many things, but never backups. At least until right after you really, really, really needed to care about backups. That's what N2WS does for your AWS account. It allows you to cycle backups through different storage tiers; you can back things up cost-effectively, and safely. For a limited time, N2WS is offering you $100 in AWS credits for setting up their free trial, and I encourage you to give it a shot. To learn more visit snark.cloud/n2ws. That's snark.cloud/n2ws.
Corey: So, this bucket was set to public access. Aha! it only allowed GetObjects, so you couldn't ListObjects. And you couldn't upload objects into it, so my famous “get people's attention by copying $4 million worth of data into their bucket” approach of finding the problem wouldn't work here. So, great. The solution to this offhand was to restrict access to that bucket to just be the CDN, so that you couldn't access it directly from the larger internet, but only from the CDN that was accessing it. Because it was using signed request from the CDN, this was pretty easy to do.
Now, there are a few things to point out here. One—and this is going to take some people aback, and cause them to gasp—but this company in question worked in the adult entertainment space. So, we now have a rough idea of what that tends to be in these buckets: generally large media files. And when you can discover this type of large media files, something that we've learned from the dawn of the internet is regardless of your bandwidth, your constraints, etcetera, if you make this stuff available to folks for free, and public, the usage and consumption of that will grow to fill whatever your constraint is. Historically, network-based constraints. But in the world of cloud, instead, it's a budgetary constraint.
So, the solution here was to shut down the quote-unquote, “backdoor” that let people query the bucket directly, and only access it through the CDN. Now, this all came from a half-hour of exploration on the phone with someone. And the reason we're able to do it that quickly is because we see this stuff, kind of, a lot. Maybe not quite this egregiously, but it's definitely a pattern that we know and recognize. The first time we see it, we got to spend a few days doing a deep dive into it. Now, it feels like it’s, oh, it's like that one other time we saw this. The second time you look like a wizard from the future. Mastery is never as far away from these things as we'd like to pretend it is.
Now, the solution that we had was pretty decent, because it basically removed almost all of this. Now, there are only S3 egress charge from that bucket is to their CDN for legitimate paying customers accessing via a locked down signed request model, but it solved their problem. There is, however, a better answer to this. And unfortunately, it's not one that we are able to implement at all.
And that better solution is for AWS themselves to maybe notice that half of a freakin’ bill is S3 data transfer. And maybe when that happens in the five-figure-a-month range, it's worth possibly flagging this for review, and checking in with the customer. “Hey, are you aware that this is the case?” Or analyzing what's going on in the account? Maybe—and I'm just spitballing here—having the account manager reach out to figure out, A) what's going on? And B) is this normal and expected? And then, even if you want to add a bonus C) on top of that, and by the way, I am your account manager, and if you need anything, please don't hesitate to reach out. What have you got planned? How can we help you as your cloud provider? It really leads to a better outcome for everyone, rather than having stories like this show up.
I'm not saying this to bag on S3, I'm not saying this to bag on data transfer pricing—much. And I'm not saying this to bag on AWS as a whole. But there needs to be a better holistic and far more systemic way of analyzing what's going on in various customer accounts, and when things fall into certain profiles. Just a quick check-in with that customer can go an awfully long way. Because we found this, and the company in question is thrilled with us. They're not so thrilled with AWS. If AWS had proactively pointed this out, it would have been a better experience for everyone.
This is the core problem that we see in cloud economics. You can charge customers for an awful lot of things, but make sure that A) they know what they're being charged for, and B) it doesn't surprise them. When they discover, “Aha, I found the misconfiguration that was driving 50% of my bill,” that doesn't feel good for anyone. And it doesn't lead to people continuing to invest in whatever cloud provider they're in. This has been my low key rant about billing. This is the AWS Morning Brief: Whiteboard Confessional. I am Cloud Economist Corey Quinn, fixing AWS bills here in San Francisco or, more directly, on the internet because that's where they all live. And if you've enjoyed this podcast, please leave a five-star review on Apple Podcasts. Whereas if you’ve hated it, please leave a five-star review on Apple Podcasts and a copy of your latest AWS bill.
Thank you for joining us on Whiteboard Confessional. If you have terrifying ideas, please reach out to me on twitter at @quinnypig and let me know what I should talk about next time.
Announcer: This has been a HumblePod production. Stay humble.