- Unconventional Guide to AWS Cost Management:https://www.duckbillgroup.com/resources/unconventional-guide-to-aws-cost-management/
- Building Successful Communities of Practice: https://www.amazon.com/Building-Successful-Communities-Practice-Webber/dp/095749193X
Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.
Corey: Ever notice how security tends to be one of those things that isn’t particularly welcoming to folks who don’t already have the word ‘security’ somewhere in their job title? Introducing our fix to that, Meanwhile in Security. To sign up for the newsletter or to find the podcast, visit meanwhileinsecurity.com. Coming soon, from The Duckbill Group.
Pete: Hello, and welcome to the AWS Morning Brief. This is Fridays From The Field, hashtag-triple-F. I am Pete Cheslock.
Jesse: I’m Jesse DeRose, and I have a question: is it hashtag-triple-F, or is it hashtag-F-F-F? Are we spelling out triple F in this hashtag, or is it just literally three Fs?
Pete: The three Fs is a little triggering for me for me, with my high school grades, so let’s just stick to—
Pete: Triple-F, I think, just has a better flow to it. But that’s a good—it’s a good point in our continued effort to make triple-F—hashtag-triple-f a thing.
Jesse: All of our audience members were really concerned about that one because they’ve been trying to get us trending on Twitter, but they weren’t really sure, was it triple-F. Or was it F-F-F, or was it something in between?
Pete: Exactly. It’s just bad. But we’re going to keep trying at it, and we’ll see what happens. Well, anyway, we are back again to continue our Unconventional Guide to Cost Optimization on AWS with another listener question. And unlike the last time we did listener questions, this question actually came in during our Unofficial Guide, which means we actually have one listener this series. Because we can’t count the last one that was from way before. So, to this one listener, thank you, thank you for listening.
Jesse: Just that one listener. Just you. Thank you.
Pete: Yeah, just you. Everyone else, no, we’re not going to, we’re not going to thank you at all. But if you want to be our second listener, go to lastweekinaws.com/QA and give us a question. What do you want to know more about?
What can we dive in a lot deeper on any of these topics we’re talking about? It’s complex stuff, and we’re all learning this, we’re all trying to figure out what works best. And not every company is the same. And that’s what I actually love about this question because this question actually came in from someone who didn’t put their name—but that’s okay—they work in the public sector, which is why they didn’t put their name in there. And they had a pretty interesting question. So, Jesse, maybe you can read this off for us and let us know what we’re going to be answering today.
Jesse: Yeah. This question is, “We’re an Azure shop, partly cloud on the way, however, we’re also becoming an Oracle OCI shop”—I’m so sorry—“And an AWS shop, and well, it’s public sector, so one-of-everything cloud provider. How do we convince management that cloud is a different thing than on-prem and needs some kind of cloud team? I dislike the phrase DevOps as a job title, but we need something to change the current model where nearly all of this work is outsourced to a quote-unquote, ‘managed service provider?’” Oof. I have so many feelings.
Pete: I would imagine. I mean, I was immediately—I felt called out, you know? Just @ me next time, public sector coward with the DevOps-as-a-job-title phrase.
Pete: They often say that only a DevOps tool, I guess—wait, what’s the term? It’s like, “A DevOps tool would give themselves a DevOps as a job title.” Of course, that’s often said about me because I gave myself a title called ‘DevOps Director’ or ‘Director of DevOps.’ Either way, you phrase it, it’s all pretty bad.
Jesse: Yeah. So, there’s a couple of different questions in this, and we’re going to dive into each of them individually. But really, really quick, I want to talk about multi-cloud because that’s kind of the underlying discussion here; something that is not necessarily the focus, but let’s talk about multi-cloud. Why is multi-cloud a thing? Why is it an important thing that you should be thinking about?
Pete: Multi-cloud is an interesting topic that could go a lot of different ways. And I call multi-cloud a lot different than hybrid cloud. I think most people are probably doing hybrid cloud, meaning you’ve got some data centers—because it takes you years and years and years to move off of those—and you’ve also got cloud workloads, or maybe you’ve got some data centers and you’re bursting up to cloud workloads; that’s pretty cool, too. I think of multi-cloud as individual applications being deployed to the cloud vendor and cloud provider, based on maybe price or features or things like that. And honestly there, a lot of the cloud providers are getting closer in feature sets.
But for example, I might want to use Lambda, but I may not want to suffer high cost of data transfer. So, can I build an application that leverages Lambda, but maybe leverages the extremely low cost of Oracle’s OCI data transfer? That made the news when Zoom signed that big contract with Oracle, it was largely driven by network data transfer. So, there are some reasons why multi-cloud might be a thing.
Jesse: And we’ve definitely seen multi-cloud in practice with some of our clients. But I also want to call out the caveat that the clients that were doing this were very mature in their cloud cost practices. So, kudos to those clients because they’re doing amazing, amazing work. But it takes time to really build up a mature, scalable, optimized, multi-cloud strategy.
Pete: Yeah, exactly. And I think the biggest challenge is that we see is, on the one hand, if you say to yourself, “I’m going multi-cloud, therefore, I will only consume core primitives like compute, block, store, object store, networking,” even though all the providers will provide you those services, obviously, the APIs to interact with them will be wildly different, but most importantly, the authentication models are going to be wildly different, how you authenticate each one of these is going to be all over the place. And that’s going to pose a pretty big challenge.
Jesse: Yeah. So, I think that ultimately gets into the first question that we want to focus on here, which is, how does developing and operating workloads in “The cloud”—quote-unquote—differ from an on-prem data center? And bonus points, how does it differ from each other, which we’ll talk about a little bit here, too. Now, when you’re thinking about on-prem versus the cloud, the first thing to think about is that your finance team is going to want to better understand your spend because they’re used to a spend model where all of your resources are purchased upfront and then depreciated over time. But now, your spend model has completely shifted to a more granular model focusing on actual usage individually over time.
Pete: Yeah, this is an interesting one. And this is the classic OpEx versus CapEx. That’s operations expenditures versus capital expenditures. A capital expenditure is something—I usually call it something you can touch; it’s a thing. A server, that’s capital expenditure.
These are largely accounting terms and should not be considered the scary things for businesses because you’re spending the money either way, it just differs about how the money is spent on a cash basis. And we could go off forever on this one—
Pete: —and I really don’t want to. But there is a difference here that defines it. I think another thing I like to think about when it comes to engineering from the data center world to the cloud world is, the way in which you operate will be charged differently, just by nature. And again, I know we harp on data transfers so much here, but it’s because the last thing people think about in a data center world, your data transfer, you may not even think about it; it’s just there, right? Some very advanced networking engineers set up this network for you; you just use it.
You’re probably not even charged for it or metered on it. That model breaks down very quickly. So, if you had an application and you were pushing uncompressed JSON all over the place because who cares? I want to spend CPU cycles on compression? I don’t need to do that, I have unlimited networking. That model is going to show very bad things in the cloud. And you have to think about that before you go down that path.
Jesse: Yeah, this really gets at the idea of total cost of ownership for these resources. Don’t just think about, “I’m buying these servers to run my application.” You need to think about also the data transfer associated with those servers, for example. You need to think about the engineering time required to manage those services. Maybe your company has decided you’re going to move to a Kubernetes cluster; you’re going to put Kubernetes clusters in each of the cloud environments that you spin up with the different cloud providers so that your developers can focus on just building containerized application workloads and just deploy them wherever. It doesn’t really matter because there’s just Kubernetes everywhere.
Pete: Exactly. It’s—I think that concept of, “Oh, it doesn’t matter. We just have Kubernetes everywhere,” leads into the next thought, which is how are you deploying to your data center assets? But then also, how are you deploying to cloud A versus cloud B? If you adopt some of the cloud-native solutions, those don’t translate really well between providers, even ones you would kind of expect to, right?
Like EKS on Amazon, their Kubernetes service doesn’t have a direct translation to the Google Container—the GKE—or Microsoft’s—is it Microsoft DevOps, or whatever they call their Kubernetes release. [laugh]. Whatever stupid name they gave it. But that’s a big point: even though you’re using Kubernetes and all these cloud vendors, the way that you interact with them is going to be wildly different outside of just what we normally say is the authentication side of it, just in the features and the APIs that you discuss with.
Jesse: Yeah, the last topic that I want to touch on here before we move on to the second main question in this discussion is public sector in general. Pete, I know, you’ve got some thoughts and feelings on that one.
Pete: Yeah. So, there are going to be a lot of constraints the public sector is going to have to deal with that, you know, my startup that just got funded yesterday for an always-on chat system is going to have a lot of different requirements on. Risk and compliance will be usually associated to these public sector clients a lot more stringently than most other companies out there. Whereas they might even have a dedicated risk and compliance team. And those compliance needs will drive a lot of architectural decisions, and thus, will actually drive a lot of cost associated with it.
So, in many cases, you’re not going to be able to—and we’ve actually seen this with clients of ours: we’ve worked with a lot of clients that are very stringent in their risk and compliance, and we’ll find places that they could potentially save or optimize, but due to their compliance needs, they’ve told us, “No, we can’t do that because risk needs this thing. Compliance need that thing.” So, some of those times, you just can’t change what you’re doing and how you’re doing it in the public sector, just because you’re not allowed to. And so those are places where honestly, don’t fight that battle; just accept it, but make sure all of the stakeholders understand those requirements.
Jesse: Okay, so now let’s address the second part of this question, which is really talking about the pros and cons of an internal cloud or DevOps team versus an outsourced team. And off the bat, I’m already upset with myself because I’m not a fan of the DevOps job title. DevOps is not a job title, DevOps is a philosophy.
Pete: Yeah. I’ve gone back and forth on this one. And I feel like we could fill a whole podcast on this topic. And I’m sure people haven’t beaten this one to death. But I had DevOps as a job title because I was trying to create this concept that we’re talking about—center of excellence, or it has a lot of other terms—and I said to myself, “Well, if we want to implement DevOps within this business, then we want someone to be in charge of that who can help level up all these teams.”
And the downside is when you are the Director of DevOps, you suddenly own DevOps, and—for whatever that means; you own it and people are going to expect that everything falls on you. And that feels like a silo, right? Which is not what DevOps is all about. So, on the flip side, though, we all know that people with DevOps in their title—or SRE—you’re going to get paid, like, 20 or 40 percent more than someone with a sysadmin title, so go get paid, people. Put that in your title if it gets you paid. But from a higher up, a director or VP, don’t be a VP of DevOps; that is a dangerous job to have.
Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Jesse: Yeah, so I really want to quickly highlight, define, this idea of its outsource team. Outsource teams usually become a center of excellence. And a center of excellence is defined as a team, shared facility, or an entity that provides leadership, best practices, research, support, and training for a focus area. In this case, this outsourced managed provider would be the focus for all cloud cost management. But there’s pretty recent research that shows centers of excellence aren’t usually the best way to get work done in an engineering space.
Jesse: Yeah, so I really want to quickly highlight, define, this idea of its outsource team. Outsource teams usually become a center of excellence. And a center of excellence is defined as a team, shared facility, or an entity that provides leadership, best practices, research, support, and training for a focus area. In this case, this outsourced managed provider would be the focus for all cloud cost management. But there’s pretty recent research that shows centers of excellence aren’t usually the best way to get work done in an engineering space.
The 2019 state of DevOps survey report asked its respondents how their teams and organizations spread DevOps and spread agile methods within their organization, and they noticed two really interesting things: low-performing organizations focused on strategies that created more silos and isolated expertise, which in some ways makes sense, but that means that they were siloing all of that expertise and information. They created this disconnect between the people who are creating the best practices in the center of excellence and the people who were following the best practices or implementing the best practices within the individual teams, whether it’s a product team, or a specific engineering team, or something else.
Pete: Yeah, there’s a lot of research out there. That state of DevOps report that is wonderfully researched talks about this a lot. And just from my own personal experience, one method that I’ve taken when trying to implement cultural change and leveling up the technology chops of an organization has maybe had a little bit of that concept of a cloud center of excellence, center of excellence type of thing, but it’s been treated more like—ah, I don’t even know the term—like a strike team; like a task force, right? We essentially would parachute into what team or organization needed or help the most. So, at a company that I was at, we were trying to basically understand who was having the most pain and trying to quantify that pain in some way.
And we found, actually, that one organization was having a lot of pain in server provisioning. Like, to get a VM provisioned to meet the needs of the business, it would take days to do this. Well, sounds like a great job for automation. So, we’re like, “Yeah, let’s do that. Let’s start building out some automation.”
This was years ago, so we’re using Chef—that was of the style of the time—and [laugh] then we noticed that when we wanted to deploy all the Chef stuff that they didn’t have any sort of continuous integration, continuous delivery system, so we shifted into building those functions out. And essentially, we were kind of moving into these teams, we were teaching them best practices, how to build things, how to do it right, leveling up their expertise, very bottom-up approach, grassroots efforts, getting people excited about these new technologies, giving them the time to learn it, and then moving on to the next team and the next challenge. And treating it as we’re not going to go in and build this thing and then run it for you forever; we’re really going to show you the capabilities and what’s available, and kind of get you out of your funk. When you’re in an enterprise and you’re down in your silo and you’re just focused on your one thing, you might not even know all the great stuff that’s out there.
Jesse: Yeah, I think that’s a really important point to share, socialize all of the information, all of the different processes that everybody’s doing because if you’re trying to solve a problem, chances are somebody else in the organization is also trying to solve that problem, and there’s no reason you shouldn’t be working together. And this really gets at the second point of the state of DevOps report about this question, which is, high performing organizations created those community structures; they created communities of practice and grassroots initiatives in order to bring folks together to solve these problems. Emily Webber wrote a fantastic book on building successful communities of practice—we’ll link it in the [show notes 00:18:01]—but she basically talks about communities of practice as a group of people who are gathering to discuss a shared passion. You can think of it as maybe a Meetup group; you can think of it as people who care about best practices together. One example that we saw was a client who had a massive Cassandra cluster internally, and there was a dedicated team who managed the Cassandra cluster, and then a bunch of other teams that were, effectively, using the Cassandra cluster for a number of things within their workloads.
And both sets of teams—both the team managing the cluster and the teams who were using the cluster—didn’t really have strong best practices, but they wanted somebody to step up and set some best practices. And they weren’t sure if they were going to be stepping on the other team’s toes if they did it, so they came together and started having this conversation to say, “Hey, these are things that we think are important as the team that’s managing the cluster,” and the people who were leveraging the cluster said, “Hey, these are the things that we think are important as the people who are using the cluster.” And they found common ground; they compromised, they created best practices together.
Pete: Yeah. I often try to think about, why is Amazon so successful? And it’s because of their ability to do what we’re talking about to their client base, their customers out there. They are building tools that their customers can consume, they are building best practices—the well-architected framework—how do you use this correctly? They give so much effort into helping the users of the service use it as best they can.
Do they do a perfect job at it? Of course not. I mean, they’re a huge place. But they do a lot more than you would expect them needing to. But that model is something that you can take and follow internally in how they create best practices, how they show how to you to do it.
Amazon does it go run your software you’ve deployed for you, but they will show, do you how to use all the tools correctly so that you can do it yourself, which is great. So, kind of given all that, what are some things that we would recommend, Jesse, instead of saying, “Oh, go build a cloud center of excellence?” What would we actually recommend instead?
Jesse: There’s this fantastic quote that I both love and hate at the same time, which is, “If it’s everyone’s responsibility, it’s no one’s responsibility.” I always struggle with this one because I believe that cloud costs should be everybody’s responsibility, but it’s true; if everybody is responsible for it, then I can say, “Well, you’re responsible for it, too, and if you’re not doing it, that I’m not doing it.”s and then nothing gets done. So, leadership needs to be accountable for cloud cost management, but they also likely need an individual or a team to champion cloud cost management, to float around—similar to Pete’s description of his experience—and create and foster that buy-in, ultimately creating that community of practice or that grassroots initiative so that everybody is on the same page together, everybody socializes together, everybody knows that they are not alone in solving these problems, and can really solve those problems together.
Dr. Nicole Forsgren was on Screaming in the Cloud recently, and she has some great things to say about this. She mentioned those types of solutions that focus on things building up communities of practice, building up grassroots efforts, and building up proof of concepts, those things will be resilient to reorgs and product changes. Those will last over time and help your organization create that lasting change that you ultimately want.
Pete: Yeah, that is a huge point is you want to build these grassroots efforts that will survive the inevitable reorganizations and product changes that are going to happen in your business. I mean, I’ve been at companies that would reorg every six months. There’s no way a center of excellence would have survived those reorgs. Those engineers, those people would have been retasked out elsewhere. But if instead you’re doing it grassroots, and you’re leveling up, and the whole rising tide lifts all boats, that can survive reorgs, and if any case it can thrive because you might end up getting these teams, and their teams are breaking up and reorganizing around and spreading more of this goodwill and knowledge across the company that has a real chance at being a force multiplier in that business.
Wow. Well, as you can imagine, this was a great question that Jesse and I both had a lot of feels on, and to the public sector coward, thank you so much for going to lastweekinaws.com/QA and sending us this question. This is an important one and we were really happy to talk about it. So, just a reminder, you can head to that same website, send us your questions; we’re going to keep answering them as we discuss the Unconventional Guide to AWS Cost Savings.
In the meantime, though, if you have enjoyed this podcast, please go to lastweekinaws.com/review, give us a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review and give it a five-star rating on your podcast platform of choice, and then let me know how stupid I was to give myself the title of Director of DevOps. Thank you.