Balancing Cost Optimizations and Feature Work

Episode Summary

Join Jesse and Amy as they talk about how it’s hard to make the argument to take an engineer off feature work, DocOps and why it’s a thing, how engineers will always over-optimize and over-engineer when given the chance, why you should never depend on free cycles to focus on cost optimization, why teams need to stay accountable to the resources they’re running, the kinds of roles that should be focused on cost optimization work, why open and clear communication across teams is so important for effective cloud cost management, and more.

Episode Show Notes & Transcript

Links:

Transcript


Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.


Jesse: Hello, and welcome to the AWS Morning Brief: Fridays From the Field. I’m Jesse DeRose.


Amy: I’m Amy Negrette.


Jesse: This is the podcast within the podcast where we like to talk about all the ways we’ve seen AWS used and abused in the wild, with a healthy dose of complaining about AWS for good measure. Today, we’re going to be talking about balancing cost optimization work against feature work.


Amy: Buckle up everyone. I’ve got a lot of thoughts about this. Just kidding. It’s just the one: don’t.


Jesse: You heard it here first, folks. Don’t. Amy Negrette just says, “Don’t.”


Amy: Don’t. [laugh].


Jesse: So Amy, does that mean, don’t balance the work?


Amy: More like don’t choose. It’s always hard to make the argument to take an engineer off of feature work. This goes for 
all sorts of support tasks like updates and documentation, and as a group, we figured out that trying to put those off until an engineer has time to do it is not going to be a thing that becomes prioritized, it eventually gets deprioritized, and no one looks at it. And that’s why DocOps is the thing. It’s a process that now gets handled as part of and in parallel with software development.


Jesse: Yeah, I’ve had so many conversations in previous companies that I’ve worked for, where they basically said, “Well, we don’t have time to write documentation.” Or they will say, “The code is the documentation.” And, to their credit, there are a lot of places where the code is very cleanly documented, but if somebody is coming into this information for the first time and they don’t have technical knowledge or they don’t have deep expertise in what you’re looking at, they need documentation that is clear, understandable, and approachable. And it is so difficult to find that balance to actually make sure that that work is part of everything that you do.


Amy: And I think what the industry has decided is that if you make it a requirement for pull requests that if you’re going to make a change, you have to document that change somewhere, and that change if it has any kind of user impact, it will be displayed alongside it. That’s the only way to make it a priority with software. And cost optimization has to be treated in a similar respect.


Jesse: Yeah, so let’s talk about cost optimization as a process. To start, let’s talk about when to do it. Is this something that we do a little bit all the time, or do we do it after everything’s already done?


Amy: I know I just cited CostOps as a good model for this, even though that’s literally what we cannot do. We can’t treat cost optimization as something we do a little bit along the way because, again, speaking as an engineer, if I’m allowed to 
over-optimize or over-engineer something, I’m going to take that opportunity to do that.


Jesse: Absolutely.


Amy: And if we’re going to do project-wide cost optimization, we need to know what usage patterns are, we need to have a full user and business context on how any system is used. So, if we do a little at each step, you get stuck in that micro-optimization cycle and you’re never actually going to understand what the impact of those optimizations were. Or if you spent too much time on one part over-optimizing another part.


Jesse: It’s also really hard if this is a brand new workload that you’ve never run in the cloud before. You don’t necessarily know what the usage is going to be for this workload. Maybe you have an idea of usage patterns based on some modeling that you’ve done or based on other workloads that you’re running, but as a whole, if this is a brand new workload, you may be surprised when you deploy it and find out that it is using twice the amount of resources that you expected, or half the amount of resources that you expected, or that it is using resources and cycles that you didn’t expect.


Amy: Yeah. We’ve all been in the situation, or at least if you work with—especially with consumer software—that, you’re going to run into a situation where the bunch of users are going to do things that you don’t expect to happen within your application, causing the traffic patterns that you predicted to move against the model. To put it kindly. [laugh].


Jesse: Yeah. So, generally speaking, what we’ve seen work the best is making time for cost optimization work maybe a cycle every quarter, to do some analysis work: to look at your dashboards, look at whatever tooling you’re using, whatever metrics you’re collecting, to see what kind of cost optimization opportunities are available to you and to your teams.


Amy: So, that comes down to who’s actually doing this work. Are we going to assign a dedicated engineer to it in order to ensure it gets done? Anyone with the free cycles to do it?


Jesse: See, this is the one that I always love and hate because it’s that idea of if it’s everyone’s responsibility, it’s no one’s responsibility. And I really want everybody to be part of the conversation when it comes to cost optimization and cloud cost management work, but in truth, that’s not the reality; that’s not the way to get this work started. Never depend on free cycles because if you’re just waiting for somebody to have a free cycle, they’re never going to do any work. They’re never going to prioritize cost optimization work until it becomes a big problem because that work is just going to be deprioritized constantly. There’s a number of companies that I worked for in the past who did hackathons, maybe once a quarter or once every year, and those hackathons were super, super fun for a lot of teams, but there was a couple individuals who always picked up feature work as part of the hackathon, thinking, “Oh, well, I didn’t get a chance to work on this because my cycles were focused on something else, so now I’ll get a chance to do this.” No, that’s not what a hackathon is about.


Amy: You don’t hack on your own task list. That’s not how anything works.


Jesse: Exactly. So instead, rather than just relying on somebody to have a free cycle, kind of putting it out there and waiting for somebody to pick up this work, there should be a senior engineer or architect with knowledge of how the system works, to periodically dedicate a sprint to do this analysis work. And when we say knowing how the system works, we’re really talking about that business context that we’ve talked about many, many times before. A lot of the cloud cost management tooling out there will make a ton of recommendations for you based on things like right-sizing opportunities, reservation investments, but those tools don’t have the business context that you and your teams do. So, those tools don’t know those resources that are sitting idle in us-west-2 are actually your disaster recovery site, and you actually kind of need those—even though they’re not taking any work right now, you need those to keep your SLAs in check in case something goes down with your primary site.


Or maybe security expects resources to be set up in a certain way that requires higher latency times based on end-to-end encryption. There’s lots of different business context opportunities that a lot of cloud cost management tools don’t have, and that’s something that anybody who is looking at cloud cost optimization work should have and needs to have those conversations with other teams. Whoever does this cloud cost optimization work, or whoever makes the cloud cost optimization recommendations to other teams needs to know the business context of those teams’ workloads so that the recommendations they make are actually actionable.


Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more visit lumigo.io.


Amy: And they should also have the authority to do this work. It’s easy to deliver a team a list of suggestions saying, “Oh, I’ve noticed our utilization is really low on this one instance. We shouldn’t possibly move it,” or what have you. And because they’re not the ones making the full architectural decisions, or leading that team, or in charge of that inventory, they actually don’t have the authority to tell anyone to do anything. So, whoever gets tasked with this really needs to be an architect on that team—if you’re going to go with this embedded resource type of person—where they have that authority to make that decision and to act on it and move things.


Jesse: Yeah. It’s really important that teams stay accountable to the resources that they’re running. And some teams don’t know any of the resources that they’re running; they, kind of, deploy into the cloud as a black box. And that is a perfectly fine business model for some organizations, but then they also need to understand that if the senior engineer or architect who is focused on cloud cost optimization work for this group says, “Hey, we need to tweak some of these workloads or configurations to better optimize these workloads,” the teams need to be willing to have that conversation and be a part of that conversation. So, we’ve talked about a couple different ideas of who this person might be that does this work. This could be a DevOps team that attaches a dedicated resource to doing this analysis work, to making these recommendations, and then delegates the cost optimization work to the engineering teams, or it could be a dedicated cloud economist or cloud economist team who does this work.


Amy: We did touch on having someone in DevOps do this, just because they have a very broad view and the authority to issue tasks to engineering teams because if they see an application or an architecture, where resources are being—or are hitting their utilization cap, or if they realize there are applications that need more or less resources, they’re able to do those types of investigations. Maybe someone on that team can take up this work and have a more infrastructure-minded view on the entire account, see what’s going on on the account and make those suggestions that way.


Jesse: Absolutely. It’s so important. Or if there is a dedicated cloud economist or maybe a cloud economist team that is able to make these recommendations, that has the authority to make these recommendations, maybe that’s the direction your group should go.


Amy: If only we spent an entire podcast talking about this.


Jesse: [laugh]. Huh, if only we spent an entire podcast talking about how to build a cloud cost team and talk about how to 
get started as a cloud economist. Hmm…


Amy: Please check out the cloud economist starter kit that we all have already published.


Jesse: Yes, several weeks ago. We’ll post the episode link in the [show notes 00:12:38] again. So, Amy, we’ve talked about when to do this work, who should do this work. What I want to know is how do these teams come together to have these conversations together? I’m thinking about best practices here. I’m thinking about how do teams start building best practices around this work so that each team isn’t working in a silo doing their own cost optimization work?


Amy: If you’re lucky, someone in your company has already done this work. [laugh]. And you can just steal their work.


Jesse: Absolutely.


Amy: Or borrow. Or collaborate. Whatever word you want to use.


Jesse: [laugh].


Amy: See if you can see how the project went, how they structured it. Maybe they ran into a process issue like they weren’t able to get the kind of access they needed without jumping through a whole bunch of red tape and hoops. That’s a good thing to know going into one of these projects, just being able to see the resources that you’re going to be looking at, and making sure you have access to them.


Jesse: Absolutely. This is part of why we also harp so much on open and clear communication across teams about the cloud cost management work that you’re doing. If you are trying to solve a problem, it’s likely that another team in the organization is also trying to solve that same problem, or ideally has already solved that problem, and then they can help you solve the problem. They can explain to you how they solved the problem so that you can solve it faster so you don’t have to waste engineering cycles, trying to reinvent the wheel essentially. It’s a really, really great opportunity to build these best practices, to have these conversations together, maybe to build communities of practice within the 
organization, depending on how large your organization is, around the best ways to use these different tools and resources within the organization.


Jesse: Well, that will do it for us this week. If you’ve got questions that you would like us to answer on an upcoming episode, go to lastweekinaws.com/QA. If you’ve enjoyed this podcast, please go to lastweekinaws.com/review and give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review, give it a five-star rating on your podcast platform of choice and tell us how you integrate cost as a component of your engineering work.


Announcer: This has been a HumblePod production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.