- Unconventional Guide to AWS Cost Management: https://www.duckbillgroup.com/resources/unconventional-guide-to-aws-cost-management/
Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.
Corey: Ever notice how security tends to be one of those things that isn’t particularly welcoming to folks who don’t already have the word ‘security’ somewhere in their job title? Introducing our fix to that, Meanwhile in Security. To sign up for the newsletter or to find the podcast, visit meanwhileinsecurity.com. coming soon from The Duckbill Group.
Pete: Hello, and welcome to Fridays From the Field. I'm Pete Cheslock.
Jesse: I'm Jesse DeRose.
Pete: And we're back, again. We're continuing our series, the Unconventional Guide to AWS Cost Management. And as always, if you have questions, as we are going through this series and want to learn more, go to lastweekinaws.com/QA. Thank you to all of those who have already submitted questions.
Pete: Really great ones coming in.
Jesse: Thank you.
Pete: We're going to take a couple of episodes in the future to answer those questions and really dive into them. So, keep them coming. We really love them so far. So Jesse, what are we talking about today?
Jesse: Today, we're going to be talking about one of my favorite topics, which is that humans are the most expensive part of Cloud.
Pete: Yeah, we hear this quite a bit. I mean, not just in salary, right? This is the line that usually is mentioned when we talk to folks about their Amazon spend. They say, “Well, outside of salary, Amazon is our most expensive bill.”
Pete: That line has been repeated more times than I can count.
Jesse: But what's so fascinating to me is that this really gets at the idea of total cost of ownership. I think that's ultimately what I really want to focus on for just a second. Total cost of ownership is thinking about all of the spend related to your cloud costs. Now, when you think about cloud costs, you will generally think about just the usage that you have within AWS, maybe some discounts from either an EDP or PPAs. But are you thinking about how much time it's taking your engineers to manage all of that usage, manage that infrastructure, manage the deployment pipelines that are living within the cloud? Are you thinking about all of those components and the cost of those components alongside your usage?
Pete: Yeah, exactly. I think engineers are bad at this.
Pete: Myself included. But this is something where we want to build things. That's why we're in this industry. And it's fun to build things. Maybe not so much fun to, kind of, ongoing manage those things. Looking at you, Cassandra and Elasticsearch clusters.
Jesse: [laugh]. Yeah, it's this idea that there are definitely opportunities for engineers to spin things up and manage things on their own when you want to build that Kubernetes cluster and learn how to manage a Kubernetes cluster, learn how to build a Kubernetes cluster. That's great. We don't want to stop you from building and learning at all. But when you're building infrastructure for your organization, for your teams, for your products, is it going to be more cost-effective for you to build this solution yourself, or is it going to be more cost-effective for you to leverage existing managed services within the cloud?
Pete: I like to call it operational FOMO, you know, the fear of missing out. And I think a lot of engineers suffer that when it comes to the new hotness, the new stuff. Kubernetes is a great example. I mean, I feel like a lot of those people were also equally like, “OpenStack is going to be the best thing ever.” And then it didn't.
But I like to think of my time at a previous company where we deployed into the Cloud, specifically Amazon, and there was a fear that was, again, we've mentioned this before, it's an irrational fear about vendor lock-in. And that fear forced us into building forced us only using core primitives: S3, EC2, EBS, really. We really didn't use much more than that. I mean, obviously, the networks and stuff go in there. And the idea was, is that oh, well, we have this portability.
And we—Duckbill Group, Corey, we've all talked about it, written about this. It's a fallacy. You're locked in for a lot of other reasons that I'm not going to go into right now. But because of that, we became very good at running our own databases and specifically consuming a large amount of time-series data. It was a security event application.
And so one of the interesting flip sides of this outcome is that we ran our own monitoring infrastructure. I didn't pay for Datadog. They called me every single day and I was like, “My metrics infrastructure cost me $1,000 a month. You're going to charge me $50,000 a month. Even if you discounted that by half, I still am going to pay a lot more.”
And the reality was, is that we became so good at managing these systems, we didn't need those services. But I always think back at like, at what cost? How much more time could we have invested in the application, the product, how we deployed it, availability, all that stuff, if we hadn't had to invest so much time into running our own Elasticsearch, running our own Mongo, our own Redis, our own Cassandra? We spent a lot of time doing those things.
Jesse: Yeah, there's a lot of opportunities to leverage managed solutions for those things. Because, again, part of it is this idea of your engineers don't have to spend time managing this infrastructure; they can spend time on other things. But also think about what are the other cost components of this architecture that you may be able to leverage by using a native or a managed AWS service? For example, if you look at Amazon Elasticsearch—is it ‘Amazon Elasticsearch?’ Is it—
Pete: I always forget if it's ‘Amazon Elasticsearch’ or ‘AWS Elasticsearch.’ And oftentimes, it doesn't feel like a rhyme or reason why they name it the way they do.
Jesse: Well, let me put it this way. If you look at the managed Elasticsearch service on AWS, you don't end up paying for some of the things that you might pay for if you were managing that infrastructure yourself, like data transfer, for example, like this infrastructure management that we talked about. So, there are other reasons why you might want to leverage native services. And again, it gets back to this idea of total cost of ownership, how much is it actually costing you to run these things on the AWS primitives, for example? How much are you actually spending among not just the compute usage or storage, but on data transfer, on the engineers who are spending time managing this infrastructure? What kind of other things could you be working on instead, during that time?
Corey: This episode is sponsored in part by CircleCI. CircleCI is the leading platform for software innovation at scale. With intelligent automation and delivery tools, more than 25,000 engineering organizations worldwide—including most of the ones that you’ve heard of—are using CircleCI to radically reduce the time from idea to execution to—if you were Google—deprecating the entire product. Check out CircleCI and stop trying to build these things yourself from scratch, when people are solving this problem better than you are internally. I promise. To learn more, visit circleci.com.
Pete: Yeah, that's a really great point about the cost of some of the managed services, specifically that replication data of Elasticsearch is going to be included. That is a thing in other services as well. RDS is another good example. And that is a big component of a lot of folks’ Amazon bills. I mean, we see a lot of Amazon bills. And I know I've said this before, but I can tell if you're running Elasticsearch or Cassandra without you telling me that.
I can just see it in your network data transfer. Conversely, I was actually shocked recently. I looked at one of our client’s bills and their usage and saw a disturbingly low amount of data transfer to the point that I was a little worried. Do they have any, like, availability requirements? Why are we not seeing a large amount of cross-AZ data transfer?
And it turns out, they were leveraging really heavily a lot of the Amazon managed services where some might say it's free, some might say it's baked into the cost, but you have to think about that. You might look at Elasticsearch at the Amazon offering managed service and say, “Wow, this is really expensive. It's a lot more; I can just run it myself.” But you have to add in all of those things. And to Jesse's point, too, if I don't have to manage setup, deal with all of the intricacies of a distributed database and I can just outsource that, then I can go on and maybe improve some other part of my infrastructure that is waking me up in the middle of the night.
Jesse: Yeah, I think another thing to think about in this context is not just how expensive is it for engineers to manage some of this infrastructure? But what kind of business risks are you looking at by asking your engineers to spend time managing this infrastructure rather than allowing AWS to manage this infrastructure natively? Specifically, there's a client that we worked with where they ran a bare-bones Kubernetes cluster on EC2 instances, and they had this amazing mature model for cost management on that Kubernetes cluster, cost attribution for that Kubernetes cluster. But all of this content ran through one person, and that led to a potential business risk. It wasn't just a matter of, it's expensive for this person to be doing all of this work managing all this infrastructure, but it's also a business risk for the business to rely on this single individual to have all of this knowledge.
If this person left the company, for example, nobody would have any idea how to manage this infrastructure, or how to attribute costs in this infrastructure or gather the financial data they needed month-over-month to attribute costs back to different teams and to review other metrics.
Pete: Yeah, I think a lot of folks, too, maybe they feel like they're giving up a sense of control? Or maybe it's a real fear, maybe it's not. I don't know. But the services that exist now on Amazon for even running other things, like I'm always a little shocked to see folks who are starting on the Cloud right now start on EC2, specifically outside of the lift and shift model. If you're lifting and shifting, yeah, yeah, you're moving to EC2. That's obvious. But if you're a brand new company, just going on to the Cloud, EC2 should be probably the last service that you're setting up.
Pete: You got Fargate, EKS, ECS, there's so many ways to run containers. And that's just easy. It's just easy to do. And it's a great way to get started. But I even look at things like the databases, as well, that allow you the ability to get started really easily and really quickly with maybe, like, a T class RDS instance.
You can change the engine size later, as your scale grows, you can increase the disk later, as it grows. That's a really interesting way to get started at a really, really low cost. You can always add more later, versus, again, in the classic data center world, buying a bunch of really big servers hoping that your infrastructure was going to grow. It's like the video game, the old world of online gaming and video game companies, they would buy all these servers for launch day, and they still wouldn't have enough. And then over time, the usage of that game would go down and down and down, and they were left over with all these servers. So, being able to start small and grow is a great way to just see how people actually use your application.
Pete: Yeah, at the end of the day, I think what we're really getting to here is that more broadly, folks should really fear less [laugh] about the managed services. Whether it's a managed service on Amazon or you're using Datadog, I mean, this concept of vendor lock-in as a way of not using the easiest service is just a really sad state of affairs to hear so many people still say this. In many cases, I say places like MongoDB, their Atlas system, they are the creators of this. Theoretically, they're the best place to get that service from. So, are you locked into them? Well, yes. But your business is locked into Mongo because some engineer provisioned it in 2014 as a side project and now you're still running it. So you're locked in with all of these decisions you make. You might as well go and use the service that is just the easiest to use.
All right, well, if you've enjoyed this podcast, please go to lastweekinaws.com/review and give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review and give it a five-star rating and tell Jesse why you loved it so much. I mean—
Pete: —hated it so much. Also, do not forget, we are still taking questions. We do want to hear your feedback. Send us a question, you can add your name or not, to lastweekinaws.com/QA and we'll answer those in a future episode. Thanks again.
Announcer: This has been a HumblePod production. Stay humble.