The Blog

The Key to Unlock the AWS Billing Puzzle is Architecture

Calendar Icon 06.09.2021
aws-section-divider aws-section-divider

When you talk about AWS billing, you’re talking about AWS architecture. Most folks don’t recognize that they’re the same thing.

All architecture is fundamentally about cost, and all cloud cost is fundamentally about architecture. Four years at The Duckbill Group, dozens of client engagements to fix horrifying AWS bills, and over 300 newsletter issues later, I’ve yet to see anything that disproves this theory.

Can you give examples of how AWS billing and architecture are the same?

I sure can! Imagine if you will, the three-tieriest of three-tier architecture: a pile of web servers that talk to a pile of application servers that talk to a pile of database servers. I can do all kinds of things to optimize the AWS bill in such an environment by asking the right set of questions and making architectural adjustments:

  • What’s the data transfer look like between AZs and tiers?
  • Are the instances themselves the proper size?
  • Is there an RDS story that improves the overall economics of the database?
  • Do the web servers autoscale?
  • Can any of these tiers benefit from Spot instances?
  • If not, do Savings Plans or Reserved Instances make sense?

All of those questions are fundamental derivatives of the architecture itself. The larger business context of the service at hand shapes the questions as well as the responses. In many cases, asking questions about the application opens or closes doors pretty effectively. If the service supports legacy browsers or clients and will eventually (hopefully!) be sunset, then optimizing the cloud resources in use makes perfect sense. The engineering cost to refactor the service wouldn’t be worth doing in that scenario.

Alternately, if the application is growing and generating significant value that’s constantly increasing, then doing a rearchitecture absolutely makes sense — but not for cost reasons! Cost is virtually never a driver behind an application rewrite. There may well be a capability story lurking in there. If the web tier can be replaced with API Gateway or the database tier gets swapped out for DynamoDB, suddenly a host of alternative options open up. They may or may not make sense for a given workload, customer, or environment, but that needs to be evaluated.

Keep in mind, if that three-tier architectured application were to be rewritten as something serverless or microservices-driven, the entire economic model would need to be rewritten. The costs would become more dynamic but also less predictable. At the same time, the application’s efficiency would almost certainly increase as well.

2 ways automated tools fail to grasp architecture and cost

The fundamental issue of most tools that purport to solve cloud spend problems is that they fall into one of two failure modes that will fundamentally fail.

Keep everything mode. These tools usually think that whatever’s in the environment is correct, should be there, and will be there forever. That doesn’t work when things have been left running that shouldn’t be or when a short-term experiment is treated as if it were permanent.

Kill everything mode. These tools assume that whatever’s in the environment is completely wrong, and that companies will somehow free up an engineering team for six straight months to rebuild something in order to save 10% on their AWS bill.

Both extremes are unsafe assumptions and aren’t how companies think about their architecture in the larger business context.

How basically every AWS tweak affects architecture … and cost

Early on (when the newsletter first started and I was still bright-eyed with undeserved optimism about cloud billing), I naively thought that tracking AWS feature and service enhancements with financial relevance would be simple: look for price reductions and the occasional feature announcement. In practice, that assumption didn’t survive the first round of customer discussions because, as mentioned above, cost is architecture. It’s the rare AWS announcement that doesn’t have architecture repercussions.

For example, if you didn’t use SQS when building your application because it couldn’t handle your throughput needs or it was too expensive, that changed a couple of weeks ago. SQS is to the point where it’s now effectively unlimited throughput at a cost that’s just 3.6% of what it was at launch. The economics have shifted, but it’s not obvious from Amazon’s somewhat dry enhancement announcement that there’s any financial impact at all.

It turns out that there’s no pattern of search terms that show just the architecturally significant releases — so I read every AWS release. Once I’d gathered all the information in one place, I figured I’d summarize the interesting bits and send them out to other people. That’s how the Last Week in AWS newsletter was born!

The very manual process of reading every little announcement and understanding its impact on architecture is also why the best Cloud Economist candidates The Duckbill Group finds when we’re hiring aren’t people with deep finance backgrounds who are are only aware of Amazon as “that company that sells books and underpants.” The best Cloud Economists are effectively already cloud engineers, solutions architects, or technical account managers themselves. Folks who excel in those roles are already viscerally aware of various design and cost trade-offs, so they know how to successfully rearchitect workloads and businesses towards a lower AWS bill—but only if it makes sense to do it given the larger business context!

aws-section-divider