Introducing From the Field: The Unconventional Guide to Cost Management

Episode Summary

Join Pete and Jesse as they launch a new AWS Morning Brief podcast series called Friday From the Field, which examines how organizations are using the cloud and what some of their major pain points are. In this episode, Pete and Jesse discuss how even the best companies only tag 90% of their resources, what the cost management circle of pain is, what it was like for Pete to work somewhere where the gross margin was -175%, the role architecture decisions plays in cloud spend, the four levers that influence cloud costs within your organization, why it’s important to understand the cost implications of product decisions, and more.

Episode Show Notes & Transcript

About Corey Quinn
Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Transcript

Corey: When you think about feature flags—and you should—you should also be thinking of LaunchDarkly. LaunchDarkly is a feature management platform that lets all your teams safely deliver and control software through feature flags. By separating code deployments from feature releases at massive scale—and small scale, too—LaunchDarkly enables you to innovate faster, increase developer happiness—which is more important than you’d think—and drive transformation throughout your organization. LaunchDarkly enables teams to modernize faster. Awesome companies have used them, large, small, and everything in between. Take a look at launchdarkly.com, and tell them that I sent you. My thanks again for their sponsorship of this episode.


Pete: Hello, and welcome to the AWS Morning Brief: Friday From the Field. Triple F; that's what we're calling it now. We’re going a new direction. I'm Pete Cheslock.


Jesse: I'm Jesse DeRose, and I'm so excited for Triple F.


Pete: Triple F. Hashtag Triple F. So, moving away, taking this into a new direction, we have… not stolen that's a little bit too aggressive. But we have been lovingly gifted this podcast from Corey Quinn after taking over while he was on paternity leave, we just kept on doing it; we never stopped, we never let him have it back. And he was nice enough just to give us this opportunity to take this Friday podcast into a new direction and talk about things that we're seeing as cloud economists in the field working with our clients.


Jesse: Yeah, it really started as this confessional discussion of weird architecture patterns that we've seen, but then it definitely morphed into more of the other things that we've seen from either our work with Duckbill or work with previous engagements or previous companies. So, it just felt fitting to rebrand just ever so slightly and focus more of our efforts on what are the things that we're seeing day-to-day? What are the major problems that our clients are seeing? What are some of the pain points we've seen? What are the new features from AWS that are really the interesting and important things to talk about?


Pete: Exactly. We have an interesting insight that I think a lot of folks in the industry don't get to see. We, for one, look at countless Amazon bills, seeing how people are spending their money. But we also are often reached out to directly to help engineering teams better answer questions that they're getting from finance. I mean, that's the biggest fear I have—


Jesse: Yeah.


Pete: —CFO comes walking over to my desk, and I haven't submitted an expense report recently like, what do they want?


Jesse: [laugh]. I didn't do it. It wasn't me.


Pete: Even worse is when some of your executives start learning some of these terms. And they say, “Hey, what's our cost per unit on Amazon Cloud?”


Jesse: Yeah, it is something that has morphed from just a conversation about engineering teams thinking about their architecture patterns and what might be best for them to getting the entire company involved—especially finance—to ask all these questions and really think about, what's the bottom line here? How can we better understand this cloud spend?


Pete: I know most people are probably thinking, “Doesn't tagging solve this problem. Can’t I just tag everything, and then I have all my answers, right?” Problem solved.


Jesse: I'm sorry, did you just tell me to go F myself there, Pete?


Pete: [laugh]. Obviously, we both know that even the best of companies, the most mature companies we work with, yeah, they might be about 90% plus fully tagged, but even those companies still have to put in a lot of effort to answer these questions and to understand where their spend is going. Because they say, that which gets measured gets improved. So, are you measuring your spend? Are you measuring your growth? Do you understand how your spend changes as usage changes, your customers change? I mean, there's countless questions. But there's another thing that we see, too, Jesse, right? This circle of pain, the—what is it—the cost management circle of pain.


Jesse: Yeah. Yeah. It's this really fascinating idea focusing on cloud cost optimization, where a company will realize that their cloud spend has gone up for whatever reasons, and they say, “Oh, no. We need to do something about this.” Whether that is because finance has come over and asked the question, or because engineering has caught the issue. 


And so they go through this quick session, maybe a quarter, maybe a couple months or more of figuring out, “How can we cut costs? Can we remove resources? Can we put these practices into place? Can we build some processes? Okay, now, everything's fine, right? We've managed to bring our costs back down. We managed to get rid of all of those EBS snapshots that were collecting dust and never to be used, so now we can go about business as usual again, right?” 


And so then they continue on as if nothing has happened. And without making long term changes, those costs are going to rise again. And then all of a sudden, we're back in the same spot of, “Oh, no, our cloud costs have gone up, why did they go up? We did all these things to make sure that we didn't have run into this issue again. Why are our cloud costs going up again?” And the cycle just repeats. It's a really unfortunate kind of spiral.


Pete: I remember my time at a startup where we were under a series of really high growth, a lot of customers coming on the platform. And my favorite meeting ever was the CEO talking about our financials. And he mentioned that our gross margin was negative 175%, which for the non-financial folks, means that for every dollar of income negative 175% is being spent for that. You normally want that number to be positive if you want to have a successful business. And remember, the line he said is, “We are going to successfully go out of business with a gross margin that is negative one hundred and seventy”—whatever I said. 


This is an important number that people need to think about. And what's amazing is that within a year, we had turned that around to be an extremely high gross margin because we started looking, and tracking, and bringing cultural change, and giving ownership to people to own these numbers. So, it's not just an engineering problem anymore. Everyone thinks that the Amazon bill is because your engineers built a certain thing, or turned on a certain type of instance. And sure, part of that is absolutely true, but I always like to say that your Amazon bill is the sum total of all of the decisions the business has made. 


The business chooses what things to do and what order to prioritize revenue over technical debt. And all those decisions will impact the bill. It just so happens that it impacts the bill in a way that's so much more visible than in the data center world when you just bought all this stuff and let it sit there.


Jesse: Yeah, I think it's really fascinating because you ultimately end up with visibility into all of these other parts of the business that you may not have known about or thought about as clearly. Because if I'm looking at a massive spike in S3 spend, maybe that's because security has said we need to keep a certain amount of records for a certain amount of time for audit purposes. Well, as an engineer, that's not something that I necessarily focus on day-to-day or care about day-to-day. But from a security perspective, that's a majorly important business decision. But ultimately, it ends up impacting the engineering teams because it's their bottom line that's being spent.


Pete: Exactly. So, we're going to be taking the next many weeks to go through what we're calling The Unconventional Guide to Cost Management with a variety of different things that we see in the field that the most mature organizations are focusing on, and why they're focusing on them, and some actionable ways to go about this improvement in your company and their cost management strategy. And you're going to think to yourself, “Wow, that sounds really boring.” It's like, well, at some point, that CFO is going to stroll by your desk and want to know what's going on. Or you're going to be instructed to build a chargeback plan or a showback plan, and if you're not ready for that, it could be a little bit of stress to your day-to-day.


Jesse: Yeah, I mean, just as we highlighted, there's so many different drivers of costs when it comes to cloud costs. But we think that the main driver of costs is your architecture decisions, and the context related to those decisions. So, if you think about a highly regulated industry versus an industry that's maybe not as highly regulated, for example, you see a lot more data that needs to be kept for audit purposes, like I mentioned before. So, there's little examples of architecture decisions made as your business grows, that have a really unique impact on your cloud costs.


Pete: Yeah, I feel like this is where a lot of these automated cloud management tools really fall down is that they lack that context. They don't understand that those systems in us-west-2 is actually my DR site that I need to have running at all times because my audit and risk team has told me I need to do that. While the CPU is not being used, and I would love to shut them down, having them off until needed does not actually meet my risk requirements. That context is—


Jesse: Absolutely.


Pete: —what is so important here. So, this Unconventional Guide falls under what we have really identified within Duckbill Group, from working with all these clients, four main capabilities. These are levers that help influence cost within your organization. And these four main capabilities are architect, attribute, invest, and predict. 


So, let's kick it off with architect. What does this mean? How you architect—what Jesse just said: how you architect your applications, are you using higher-order functions within Amazon. Specifically, are you using Lambda? Lambda increases the ephemerality of your systems; the less they're running when they're not doing anything, the cheaper your bill will be. 


If you have T class instances, maybe you need a lot of memory allocated but CPU is very intermittent. How you're using, how you've architected, how your application was designed, either maybe specifically designed for the Cloud or just by accident as things evolve over time. But how you architect your application and the requirements that fit under that architecture is one of the main drivers of cloud cost. But attribute. What about attribute? What does that mean, Jesse?


Jesse: Yeah. I think it's important to call out that when we talk about these capabilities, architect is definitely one of the most broadly used and referenced capabilities that we see in terms of talking about cloud architecture and architecture decisions. But there are these other three capabilities that are important to highlight because they do also impact cloud costs. So, the attribute capability focuses on attributing cloud costs within your organization, whether that is to specific teams, maybe specific product lines, maybe specific business units. It really depends on how your organization structures itself. 


But you want to be able to provide your cloud costs bottom line to each of these teams to say, “Okay, this team is spending this much money per month to run their application or their microservice.” When your organization can see where your AWS costs are going along business lines, you understand the context for that cost. You move away from a vague pain around your bill to making informed decisions about engineering investments. So, ultimately, you can build showback models or chargeback models so that you understand how much each team or each business unit is spending on the Cloud. And then ultimately that gets into your unit economics as well, which we'll talk about in a second. But accurate cost attribution really helps everybody in your business understand the costs of your business decisions.


Corey: This episode is sponsored in part by CircleCI. CircleCI is the leading platform for software innovation at scale. With intelligent automation and delivery tools, more than 25,000 engineering organizations worldwide—including most of the ones that you’ve heard of—are using CircleCI to radically reduce the time from idea to execution to—if you were Google—deprecating the entire product. Checkout CircleCI and stop trying to build these things yourself from scratch, when people are solving this problem better than you are internally. I promise. To learn more, visit circleci.com.


Pete: I think that's an important one, too, working at SaaS businesses, usually the SaaS product is going to grow as the customer count grows. But one thing that really I find a lot of product teams fall down on is they don't accurately understand the cost of product decisions. Product decisions often is going to be a big driver of your spend. If a product team has a certain requirement to keep data for long periods of time but they're not going to charge customers more for that, you've got a big breakdown there. And so, I've always had a lot of success attributing in my applications at the product level and bring that as ammunition to different product meetings and say, “Hey, I'm looking at your product backlog, and these are the four projects we're working on. 


But those four projects represent 10% of our spend, and your fifth project represents 80% of our spend”—or something crazy like that—“Maybe we want to readjust this.” Maybe not. Maybe the business doesn't need to do that or want to do that. Those are all different things that, if you don't have that information, you can't ask those questions. Another of the big levers we mentioned earlier, investing. 


Making the biggest commitments that you can to Amazon—to really any cloud vendor, but specifically Amazon—will reduce the cost of running on Amazon. If you can make commitments, whether it's an upfront savings plan—that's a really simplistic way of making a commitment to Amazon to reduce your spend—the longer the commitments you make, if you have confidence that you're probably not going to be leaving Amazon anytime soon unless you're forced to—like they just turn you off—speaking for no company in particular—but that aspect of, you're going to be on Amazon for three years, you're probably going to be on Amazon for five years. If the company can make a five-year commitment over a three-year, you will save more money as part of an enterprise discount program, things like that. So, the contracts that you enter into, the longer that you can extend these, the larger that you can make them without overextending, the better off you're going to be. So, those investments are big ways to move those levers. But they all lead to this final lever, which is predicting your spend.


Jesse: Yeah, and I think it's also important to call out that when you think about investing in AWS, it's not just about putting money down to get a discount, it's about investing in your relationship with your AWS account team, your account manager, and your technical account manager, and whoever else you work with from your account team want you to be happy with AWS; obviously, they want you to continue using AWS. So, building a good rapport with your AWS account team will take you so far. It can lead to a lot of really great conversations around best practices, around architecture discussions, but also it could potentially lead to your AWS account team going above and beyond to help you get that extra percent discount when you are renegotiating your EDP or renegotiating your private pricing addendums.


Pete: Yeah, that's actually a really great point is, the better relationship you have, the more potential savings or just overall improvements that could be identified. I mean, that is your one way to get into the various Amazon AWS teams is via that account management. So, the better that relationship looks like, the more positive it will be for the business.


Jesse: Yeah. So, last but not least, we have predict. Predict is all about predicting your future spend. It's all about forecasting future spend. And this is the one that makes all of the finance team just drool. 


If you can share prediction models with your finance team predicting your engineering spend, or your cloud spend out six months to a year in advance, they are going to love you forever and ever. It is absolutely worth your time to work on building these models. But in order to build those models, you need to understand what does your spend actually look like right now. What does your spend look like per business unit or per team, like we talked about before. Which is where the conversation about the attribute capability comes into play because you need to understand which teams or which business units are spending what on the cloud first, before you can accurately predict what those teams are going to spend in the future. 


And most importantly, this gets to a conversation of unit economics, which we will dive into in more detail in a later episode, but it allows your business to understand how much does it actually cost per user who is using your application, for example? And how much can you competitively charge a company or a user so that they get a decent price for your service, but you also are able to make a profit?


Pete: Yeah, if you want to be the hero of your sales team and probably your CEO as well, having this ability to forecast when your sales team is out there trying to make a really aggressive offer to bring in a new customer. If they have the confidence in their ability to discount, they can make this really competitive offer to try to bring in new business. And that confidence is brought about by the fact that if you do understand your cost per unit, your ability to deliver the service to that specific customer, they can go in and make the best possible savings discount without risk to the business, which then makes, again, all the executives and the board happy. So, that is the superpower [laugh] of cost management.


So, those four main capabilities that we talked about: architect, attribute, invest, and predict, we're going to tie into those over the next many weeks as we talk about this Unconventional Guide. All these different tips that can help you impact your spend, hopefully in a positive way, within the business. So, we're going to be diving into these. We want to answer your questions along the way and we want to take the time to break out and answer some of those questions diving deep for you as you learn these concepts and bring in your own complexity within your businesses. 


Again, you can always go to lastweekinaws.com/QA. That is to ask us a question. Feel free to add your name if you like, but you can be totally anonymous, too. If there's something that we're talking about that you want us to clarify further, we're going to break up these sections by doing some listener Q&A and diving into some of those, explaining some common practices that we've seen that hopefully will help guide you as you as you work on this process. 


So, again, if you have any questions, hit us up, lastweekinaws.com/QA. We will collect those, we'll dive into them, and we're already been getting some really great questions that we're really looking forward to dedicating some future episodes on. So, this is AMB Friday From the Field. I really appreciate taking the time. 


If you have enjoyed this podcast, please go to lastweekinaws.com/review, give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review, give it a five-star rating on your personal podcast platform of choice, and then go to lastweekinaws.com/QA and tell us what you hated about it, or just give us a question. We'd love to read it. Thanks so much.


Announcer: This has been a HumblePod production. Stay humble.



Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.