The Blog

Reader Mailbag: Billing

Calendar Icon 09.23.2020
aws-section-divider aws-section-divider

At the start of These Unprecedented Times, my business partner and I hosted two Q&A sessions, fielding all sorts of questions from the audience about the world of AWS. Here are the questions that are filed under the general billing category, with a bit more nuance in the responses.

1. Why is my bill so big?!

In the world of cloud, you are not billed for what you use. You’re billed for what you forget to shut off.

When we start working with customers, they tend to assume we’re going to show up on day one, reach into our bag of tricks, and make the bill shrink with arcane magic.

In reality, here’s step one: Let’s take a look at the bill. Measure twice, cut once.

Many people love talking about the AWS bill in the same way it’s presented to them: alphabetically. But if someone else asks me about Alexa for Business one more time, I’m going to snap.

Look at the big numbers first and work your way down. There aren’t any secrets here. The bill is in the only single source of truth.

Unfortunately, AWS doesn’t have an inventory service—other than the bill itself. Welcome to hell.

2. What tips do you have for lowering the cost of AWS that apply very broadly?

Let’s jump right in:

Figure out where the money is going. You don’t want to spend time optimizing something that isn’t a meaningful portion of spend. Do the easy thing before doing the hard thing.

Focus on easy opportunities for winning. The patterns tend to be consistent across customers. EC2 is almost always the biggest spend category. And it’s followed by data transfer, Elastic Block Store, RDS, and S3, in no particular order. Start there and you’ll be in a good spot.

Check out managed NAT gateway. This is a handy service, but it has a way of being phenomenally expensive by adding a 4.5-cent per gigabyte data processing charge on top of any data transfer charges. If you’re storing data in S3 through one, use a private endpoint. They’re free.

Don’t buy the entire thing at once. If you’re hemming and hawing about making a Reserved Instance or Savings Plan purchase because you’re unsure of the future, buy some portion of it. Cut the proposed purchase by 20%–and then 20% again, and so on until you feel comfortable with it. You can always make another buy afterwards.

Get rid of data you don’t need. The biggest pattern I see about big data projects is they claim to be able to find anything except a business model. Do you really need all of those transaction logs from 2012? If you’re storing 4 petabytes of them, I’d posit maybe not. Worst case, you can always transition old data to Glacier Deep Archive (which is cheap as dirt), and you can retrieve it within a day’s time if you ever need it.

So there you have it. If you have a problem that I can solve in a tweet, that’s what I’ll do. Sending you a contract would be weird anyway. Truth be told, people gave me favors when I was starting out, and I believe in doing the same.

If you’re a small shop worried about your AWS bill, get in touch. I’ll be waiting!

3. What’s the biggest surprise you’ve had on your AWS bill?

The biggest surprise Mike once experienced way back when we were first getting started working together in an AWS environment was that the CloudWatch bill was larger than the DataDog bill that was hitting the CloudWatch API. That nonsense gets expensive.

My takeaway was: Holy crap, they charge for that? Today I’m less surprised. Of COURSE they charge for that! They charge for everything!

Take a look at what you’re actually monitoring. EBS volumes are a great example of this. You can get hypervisor metrics from CloudWatch—but not internal metrics. In almost every case, you don’t actually care about those hypervisor metrics for any of your EBS stuff. What happens is if you take a look at the various metrics you’re pulling, it doesn’t really answer the real question you have: Is the disk about to fill up?

Pro tip: Use gp2—not io1 or now io2—for almost everything. You’ll save a boatload of money. Provisioned IOPS are expensive.

To get the information you actually care about, there needs to be an agent inside the guest operating system. If you’re running something like DataDog, just query that about the data you care about in your volumes and turn off the things that are charging you per request for data you don’t actually need.

And if for some strange reason you ever need it in the future, you can access it directly within CloudWatch. You just won’t be getting charged for requests until then.

4. How do large enterprises pay their bills?

The way you or I end up paying our AWS bill in any case is here’s a credit card. And then when that card’s invariably maxed out, we move on to here’s another credit card. If you use the Amazon Prime card, despite what it says to the contrary in the terms, you can get 5% back on your AWS bill. You’re welcome. Thanks for reading!

Once you get into enterprise territory—which is somewhere in the $1 million/year range for cloud spend—both Amazon and companies that are paying Amazon switch over to invoice payments, which are paid via check, wire, or ACH. A customer might be able to negotiate 2% off of their bill while Amazon avoids getting hit with higher credit card processing fees.

It’s the same reason why you usually can’t buy a car on a credit card. The dealer doesn’t want to eat the interchange fees.

5. What are your suggestions for dealing with costs that don’t get attributed to resources or tags in the CUR (e.g., data transfer bytes out from Lambda)?

You’re never going to get full visibility into the spend of everything. At some point, you end up giving up. We’ve gotten to 80 or 90% coverage for tags, and the rest is going to slush. You want to be directionally correct. But, for almost every shop, you don’t want to drive yourself mad by spending thousands of dollars to trace down pennies.

So the answer is this: It probably doesn’t matter.

For most architectures, the Lambda cost and auxiliary costs incurred by Lambda are miniscule. Any time we see thousands of dollars in Lambdas we see millions of dollars in EC2s.

Make sure this is a problem worth solving. Unless there’s a strategic reason to go down that rabbit hole, it probably isn’t worth it. Time that your team spends playing slap-and-tickle with the AWS bill is time they’re not spending working on their next feature.

6. What’s the first thing you would do to reduce EC2 networking costs?

I’d figure out what the [networking costs look like]( Is it between different Availability Zones? Is it out to the internet? Is it something else? What region is it in? What workloads do I have hanging out there? What’s likely to be causing these costs?

You can get fairly granular after a few iterative cycles in Cost Explorer. But the other side of it is going down the rabbit hole of looking into what’s happening in those subnets.

I’ve found that it’s easier to solve this problem by talking directly to people and finding out what’s what. Maybe I’ll get a response like this: Oh, that’s the thing where we just replicate data around in a circle because we don’t understand what a Storage Area Network is!

Figuring out what it is that driving the cost usually starts the conversations. You can also go deeper with VPC Flow Logs, of course. But they are a bear to wind up doing meaningful analysis on.

7. Any advice on how to break down costs into a per-user metric? There seems to be a huge disconnect between bulk resource costs and a user doing a few API calls.

It’s a spectrum. You start off with the most naive approach that everyone does. You take the AWS bill for the last month, you look at the user metric you’re trying to get to (e.g., monthly active or daily active), and you do simple division.

That starts to break down because—no matter how many users you have or don’t have—you’re going to spend the same amount of money for the JIRA server, for example. And this is where conversations with finance begin to become valuable and why those tags are important.

When you’re trying to answer this question, you’re likely going to break things down into a model where you have a single user and you have to spend x dollars to get everything up and running for the infrastructure tooling around it. And then on top of that, there’s a marginal cost for every additional user.

For most workloads, that’s going to be more aligned to the cost per thousand users. Because unless you have a few very large customers, you’re not going to see a number that’s meaningful to humans. Every user we have costs .00003 cents. That number doesn’t make any sense to anyone. Focus instead on getting a number that is meaningful for planning and discussion purposes.

The challenge is people tend to optimize what they measure. And in some cases, driving costs down on metrics like this is the wrong answer. You’re going to want to optimize for other things, like user happiness. Otherwise, you can drop the cost per-user to zero by turning everything off. But that’s usually untenable—despite the fact that many companies should probably do that. Like Facebook!

If you have any other questions about AWS billing you’d like me to answer, please reach out.