That Datadog Will Hunt with Dann Berg

Episode Summary

Dann Berg, Senior Cloud Analyst at Datadog, was an early guest on "Screaming" is back again! Now with the title “senior” attached to the front end of his job. Dann and Datadog are also steeped in the mires of AWS billing, so naturally he and Corey have a lot to discuss in regard to cloud costs. From the arrival of FinOps, to building out an architecture across a team of very specifically selected people, there is a lot going on at Datadog that deserves attention.

Dann and Corey go into the weeds of cost optimziation, and each of them bring their respective expertieses forward. Dann also talks about how Datadog is developing, and their exciting future. Dann’s offers his take on multi-cloud and how Datadog is tackling their costumer needs there. But the talent doesn’t end there, Dann is also an emerging thinker and influencer in the space, and to boot, an accomplished writer and playwright. Two of his plays been produced in NYC and China. Check out their conversation!

Episode Show Notes & Transcript

About Dann

Dann Berg is a Senior CloudOps Analyst at Datadog, and has nearly a decade of experience working in the cloud and optimizing multi-million dollar budgets. He is also an active member of the larger technical community, hosting the monthly New York City FinOps Meetup, and has been published multiple times in places such as MSNBC, Fox News, NPR, and others. When he’s not saving companies millions of dollars, he’s writing plays, and has had two full-lengh plays produced in New York City and China.

Links:

Datadog: https://www.datadoghq.com
Personal Website: https://dannb.org
LinkedIn: https://www.linkedin.com/in/dannberg/
Twitter: https://twitter.com/dannberg
Monthly newsletter: https://dannb.org/newsletter/
Previous SITC episode with Dann Berg, Episode 51: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/episode-51-size-of-cloud-bill-not-about-number-of-customers-but-number-of-engineers-you-ve-hired/

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.

Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.

And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.

With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. If there’s one thing that I love, it is certainly not AWS billing, but for better or worse, that’s where my career has led me. Way back in Episode 51, I had Dann Berg, the CloudOps analyst at Datadog. And now he’s back for more. Things have changed. He’s now a senior CloudOps analyst, and I’m hoping my jokes have gotten better. Dann, thanks for being bold enough to come out and find out.

Dann: Yeah. I’m excited to see if these jokes have gotten better. That’s the main reason for coming back.

Corey: Exactly. Because it turns out that death, taxes, and AWS bills are the things that are inevitable and never seem to change.

Dann: Yeah. They just keep coming. They never stop, and they’re always slightly different than you expect. I guess, just like death and taxes.

Corey: So, when we spoke back in, I want to say 2019 is when it aired, so probably that—ish—is when we had the conversation, if not a little bit before that, you were effectively a team of one, and as mentioned, had the CloudOps analyst title. Now, you’re a senior CloudOps analyst, which I assume just means you’re older. Is the team larger as well? What does that process look like? How has it evolved in the last couple years?

Dann: Yeah, it’s been interesting, especially being a single organization and that organization being Datadog, that to be able to grow the team a little bit. So, as you said, it was just me. Now, it’s a total of four people, including myself, so three others. And, yeah, it’s been interesting just in terms of my own professional development, being able to identify what needs to be done, how much capacity I have, and being able to grow it over time, especially in this fairly new space of being specifically focused on cloud cost billing. So, kind of that bridge between engineering and finance, which itself is kind of a fairly new space, still.

Corey: It is. And my favorite part of having these conversations with folks who have no idea what this space is, is learning—when I was starting—out how to talk about this in a way that didn’t lead down weird paths. It’s, “Oh, you save money on Amazon bills? Can you help me save money on socks?” It’s like, “No. Well, yes. Get the Prime card, it gives you 5% off. But no.” And yeah, I talk about camelcamelcamel and other ways of working around the retail side, but that’s not really what I do.

It’s similar to back when I was doing SRE-style work. I made it a point never to talk about being someone involved in working in tech, or suddenly you’re the neighborhood printer repair person. Similarly, you have, I guess, gone in a strange direction because you weren’t, to my recollection, someone who had a strong SRE background. That’s not where you came from in the traditional sense, is it?

Dann: No, not an SRE background at all. Yeah, I mean, it’s really interesting. So, talking about this space, I mean, people are calling it a lot of different things, cloud economics, the term FinOps—financial operations—is being used a lot, now—

Corey: Cloud financial management is another popular one. Oh, swing a dead cat, you’ll hit 15 different words, and I give—my advice on that, even though I hate some of the terms is, cool. If people are going to pay you to have a title, even if you think it’s ridiculous, you can take the money or you can die on a petty naming hill and here we are.

Dann: Yeah. And it’s interesting because the role that I was hired for at Datadog was very much this niche, very specific role that I didn’t realize was a niche, very specific role at the time. So previously, I was at a company and I was building out their data centers, so I was working with vendors, buying servers, sometimes going on-site, installing, racking those, dealing with RMAs. And I was getting more involved as their cloud usage was growing and bringing some of those hardware capitalization cost procedures to the cloud. And so I found myself in this kind of niche role in my previous company.

And a Datadog, they basically had the exact same role that was dealing with all of the billing stuff around the cloud—kind of from an engineering perspective because it was on the engineering team—but working closely with finance, and I was like, “Oh, these are the skills that I have.” And it kind of fit perfectly. And it wasn’t until after I got to Datadog and was doing more research about this specific space that I discovered just how wide open it was. And I mean, meeting you was one of the earliest things that I did in the industry. Discovering the FinOps Foundation and a few other things has kind of like opened my eyes to this as an actual career path.

Corey: It’s an expensive problem that isn’t going away anytime soon, and it is foundational and core to the entire rest of how companies are building things these days. My argument has been for a while that when it comes to cloud, cost and architecture are the exact same thing. You don’t have the deep SRE architect background, but you’re also now a member of a four-person team. Does everyone in the team have the same skill set as you, or do you wind up effectively tagging in subject matter experts from different areas? How is the team composed? People love to ask me this question, and I strongly believe there’s no one way to do it. But what’s your answer?

Dann: Yeah, I mean, the team works very much in terms of everybody kind of taking on tasks that they need to do, but we did hire for specific skill sets when we tried to find people. So, the first person that we hired, we wanted them to have more of a developer engineer type background, writing code, stuff like that. The third hire, we were looking for somebody that was more of a generalist. I’ve seen myself more as a generalist in the space; anything that’s going on, I can pick it up and make some progress on it and build something out. And then the fourth person, we were lacking some of the deeper FP&A or FinOps experience, and so we found somebody with more of that kind of background and less of the engineering experience, but they were eager to, kind of, move from finance into more of an engineering role. And I feel like this is the perfect role for that because I feel like there are a lot of non-engineers that want to break into engineering and don’t really know how to do it. And if you are in finance, in FP&A, finding one of these more cloud-cost-optimization-specific roles is the great way to bridge that gap, I

feel.

Corey: The last time we spoke, I was independent, doing this all myself, and it turns out that taking all of the things that make me and trying to find those in other people is a relatively heavy lift, even if you discount the things like ‘obnoxious on Twitter.’ So, how do you start decomposing that? Well, now we’re a dozen people and we’ve found ways to do it. But by and large in our experience, for the way that we interact—and I want to get to that in a second—is that it’s easier for us to teach engineers how finance works than it is the opposite direction. And there are exceptions to that, and as we scale, I can easily see a day in the near future where that is no longer the case.

However, we also have two very specific styles of engagement. We do our cost optimization projects, where we go into an environment and, “Oh, fix this. Turn that thing off. Do you really need eight copies of those four petabytes of data? Oh, you didn’t realize they were there. Great, maybe delete it.” And we look like wizards from the future and things are great.

The other project that we do is contract negotiation with AWS, especially at large scale. It’s never as simple as people would have you believe because, “Oh, you’re doing co-marketing efforts, and you have a very specific use case, and there are business partnerships on 15 different levels, and that all factors into how this works.” It’s nuanced and challenging, and of course, because it’s a series of anecdata, I can’t really tell too many stories in public about that. But those are the two things that we wind up focusing on. You are focusing on a very different problem.

You’re not moving from company to company, basically reimplementing the same global problem, solving it locally for them. You are embedded in an account for the duration, almost four years now by my count. And, “Okay, I guess I could just do a whole bunch of cost optimization projects on a quarterly basis in an environment like that,” doesn’t seem like it solves the problem in any meaningful way. What

does your team do?

Dann: Yeah. Well, I mean, that’s such an interesting question. Just in terms of—yeah, if you’re doing consulting, you’re starting from square one every time you get a new contract, a new engagement, and being at the same company for, like you said, about four years, going on four years now, you really have a chance to dive in and think about, “Okay, what does it mean to work cloud cost optimization into just the regular business cycle of how it works?” Because I mean, you have the triangle that everybody’s familiar with: things can either be cheaper, faster, efficient and at different stages in the product lifecycle, you want to be focusing on these areas, more or less. And so, on our team, the different things that I’m thinking about is, first is visibility, is you want to provide engineers visibility into their cost. And not just numbers, right? Actionable visibility where if something needs to change, they need to do something, they know what that is.

And a lot of the times, that means not just costs, but also efficiency. So, these are the metrics that this particular application should be scaling against. As this application grows, as usage grows, are we remaining as cost-efficient? Then there’s also the piece—as you’re saying—like discovering things within the infrastructure that, “Hey, if we make this change, or if you turn this off, if we do things this way, we’ll save a bunch of money. Let’s do those.”

There’s things like reservations, committed use discounts for GCP, all of those kinds of things we manage. And then dealing closely with verifying our bill, working with finance—FP&A—on cost modeling forecasting, both short-term—like, within a month; like, what are we going to be at the end of this month and it’s the 10th right now?—and also, what does our next quarter look like? What are our next two years look like? And that bleeds into the contract negotiations, those kind of things as well.

So, I mean, it’s setting up the cycles of how do you prioritize this work? What is the company focusing on at the time? And what can you do when the company is not focusing explicitly on deciding to save money?

Corey: One of the more interesting aspects of my work that I didn’t expect is, whenever I wind up starting an engagement, or even in the prospect stage, I love asking the dumbest possible questions I can think of because it turns out they’re not. And the most common one that I always love to start with is, “Oh, okay. Your AWS bill is too high. Why do you care?” And that often takes people aback, but once you dig down underneath the surface just a little bit, it becomes pretty clear that the actual goal is not that it’s too much money—because spoiler, payroll always cost more than infrastructure—instead, it’s, “How do I think about this? How do I rationalize what the additional costs are going to be per thousand monthly active users or whatever metric it is you’re choosing to use?”

And how do you wind up forecasting that because the old days of data centers where you—“Well, we’re going to spend a boatload of money, and then we’ll have capacity for the next, ehh, two years, maybe down to eighteen months, depending on growth,” that’s easier for companies to rationalize around, rather than this idea of incremental cost on a per-unit basis, but not exactly because it also turns out that architecture changes, problems of scale, AWS pricing changes from time to time, all tend to impact that. What I think is not well understood in this space is that yeah, if you have a 20% overage this month, people are going to have some serious questions, but they’re also going to have those same questions if you’re 20% low.

Dann: Yeah. I mean, understanding why people care about the cost is definitely the first step because with a single company, so it’s just

constantly looking at the numbers rather than understanding exactly what motivations a company has to contact somebody like you, like a consultant, right? Because usually, I imagine that it’s going to be a bill, maybe two bills, three bills come in, and they keep going up and up and up, and they need to go down. And they’re going to have an explicit reason why it needs to go down; finance is going to say, “Margins are x, y, and z,” or, “Revenue has done this; our costs can’t do this.” There’s going to be explicit reasons because if there aren’t reasons, then they shouldn’t necessarily be focusing on costs at that moment in time.

What you want to do is have—I mean, this is way more complicated than just saying it out loud, but have a culture of cloud cost mindfulness, where people aren’t just spinning up resources willy nilly. But also, my goal is for people not to have to really think about cost that much other than just in a way that helps them do their work. Because I mean, I want engineers to be able to build stuff and build stuff fast—that’s what the cloud is all about—but I also want to be able to do it in a way that isn’t inappropriately high in cost.

Corey: I have my thoughts on this, and I’ve shared them before and I’ll dive into them again, but how do you approach that? If Datadog makes a grievous error and hires me to write code somewhere as an engineer, what is the, I guess, cost approach training for me as I wind up going through my onboarding as part of an SRE team or an application team?

Dann: I mean, this feels so basic as to not even be the right answer, but honestly, visibility is the easiest and best thing that you can give people, and so we’ve built out some visibility reports that engineers get on a regular basis. We also meet with our top—what is it—ten or fifteen spending internal engineering teams on a monthly basis to go over those costs so that they understand what they’re looking at so that we understand the context behind it, so that we can understand what’s on the roadmap going forward so that when things in the cost happen, we’re aware. And then we’re just staying on top of things. And if we have questions, we have an open dialogue with engineers and things like that.

In an ideal space, it would be great to have cost, I guess, more fit into the product development lifecycle in a more deeply ingrained way, but at the same time, I really don’t want to serve as a gatekeeper. Our goal is not to stop any sort of engineering process. And we haven’t needed to do anything like that although I guess every company is going to be different in terms of what their needs are. But yeah, I’m totally happy to being a little bit more reactionary in terms of looking at the numbers and responding, and then proactive just in terms of the regular communication with people.

Corey: I tend to take the perspective that engineers need to know enough about cost to maybe fill an index card at most because you don’t want them, I guess, over-fixating on it. Left to my own devices in my personal account, I’ll see a $7 a month bill and, “Oh, I’m going to spend two weeks knocking that down to $4.” And of course, I can do it, but is that the best use of my time? Absolutely not.

Very often what is a lot of money to an engineer is absolutely not to the business. And vice versa when you bring in a data science team; it’s, “Oh, yeah, we need at least four more exabytes of data because we never learned to do a join properly.” Yeah, maybe don’t do that. Understanding the difference between those two approaches is key. But I’ve always been of the mindset that I would rather bias for letting developers build and experiment and have things that catch outsized things quickly, then trying to wind up putting a culture of fear around cost because I’d much rather see whether the thing they’re trying to build is possible to build, then go back and optimize it later, once that’s proven out. But again, this is a nuanced thing.

Everyone seems to think I have this back pocket answer that will apply to all companies. And you’ve been doing this at Datadog for almost four years with a team of people. I am an outsider; I see the global trend, I see what works in different ways in different companies, but the idea that I can sit down and say, “Oh. Well, clearly the thing you’re doing is completely wrong because that’s not how I think about it,” is the hallmark of a terrible consultant. There are reasons that things are the way that they are and it’s generally not that people are expecting to do a terrible job today. You know, unless they work in the Facebook ethics department, which is neither here nor there.

Dann: Yeah, I mean, like I said, the product lifecycle, when you’re building something new, you want to go as fast as possible. When you’re launching it, you want it to be as reliable as possible. Once you’re launched, once you’re reliable, then you can start focusing on costs is, kind of like, not the universal rule, but kind of the flow that I tend to see. So, as you’re at a company that is regularly innovating, creating new products, going through that cycle, you’re going to have these kind of periods.

As well as you have the products that have been around. There’s a lot of legacy code, there’s a lot of stuff going on, that maybe isn’t the best, or some efficiency work that has been deprioritized for whatever reason, that maybe it’s time to start considering doing this. So, keeping track of all of that. And like I said, if for whatever reason the business wants to focus on cloud cost efficiency, or a team has decided that in a particular quarter or for a particular reason they want to focus on that, being able to assist as much as you can, being able to save all that work so that there’s kind of like a queue that you can go to when it is time to focus on cost efficiency stuff.

Corey: So, here’s a fun one for you. As of the time of this recording, it’s a couple weeks old, but if you’re anything like what we do here for some of our more sophisticated clients, we do occasionally build out prediction models, models of economics that wind up defining how some architectural patterns should be addressed, et cetera, et cetera. What’s always fun is the large clients who have this significant level of spend on an outlier service. Every once in a while—it was great that we got to do a deep dive into the Washington Post’s use of Lambda because normally, Lambda is a rounding error on the bill; they had a specific challenge and they did a whole blog post on this for the AWS blog. I believe the Monitoring Tools blog, but don’t take that at face value; I never remember which AWS blog is which because AWS doesn’t speak with a single voice on anything.

But yeah, most of the time is block, tackle, baseline stuff that is the big driver of spend, but a few weeks ago, they change the pricing dimensions for S3 intelligent tiering, where there’s no longer a monitoring charge for objects that are smaller than 128 kilobytes, and there’s no 30-day minimum. So, the fact that those two things went away removed almost every caveat that I can picture for using S3 intelligent tiering, which means that for most use cases, that should now be the default. I imagine you caught that change as well, since that’s one of those wake up and take notice, no matter what time of the world [laugh] it is where you are when that gets dropped. How did that change your modeling? Or did that not significantly shift how you view any of this?

Dann: No, I mean, I think part of our role within the organization is to pay attention to stuff like that, and then you just have those conversations with the teams that I know were either exploring intelligent tiering. We do some pricing modeling for different products, S3 storage for different types, so updating those and being like, “Hey, this might be something we want to actually use and explore now.” Similar and I guess, more of something that I actively worked on that I consider in the same category is when Amazon announced savings plans as replacing convertible reservations. Because at first they announced, and being like, “Okay, well, it’s going to automatically rebalance between… different instance families across regions, too”—which convertible RIs could never do it—“And it’s going to be the exact same price for a compute savings plan as a convertible RI.” And we were kind of like, what’s the catch? And we spent a few weeks doing a deep dive working with our data science team, kind of like being, “Where is the catch here?”

Corey: Yeah, the real catch is that you can’t sell it on the secondary market if it—

Dann: Yeah.

Corey: —turns out you bought the wrong thing, which if that’s your Plan A, then good luck.

Dann: Yeah. We definitely don’t use that secondary market. I don’t have as much experience there, although I’m sure some people can use it to their advantage.

Corey: Almost no one does. In fact, the reason that it exists—my pet theory—is that once upon a time, companies would try and classify some of the reserved instance purchases as capital expenditures, which there has since been guidance from regulatory authorities not to do that. But at the time, the fact that you could sell it to a third-party on the secondary market would help shore up that argument. If you’re listening to this, and you’re classifying some of your RIs as CapEx, please don’t do that. Feel free to reach out to me, I can dig out the actual regulation and send it to you. There are two of them. It’s a nuanced topic. If you’re listening to this and have no idea what I’m talking about, God, do I envy you.

Dann: [laugh]. Yeah, definitely don’t do that. [laugh].

Corey: There was a lot that was interesting about savings plans. When I was read in the month or so in advance of them being announced, it was, “Great. I want to see this and this and these other things, too.” And some of those things came to pass. It was extended to work with Lambda.

Now, I don’t believe that is financially useful in almost every case, but it doesn’t need to be because so much of cloud economics from where I sit is psychological in nature, where, “Oh, we have this workload that lives on EC2 instances and we want to move it to Lambda, but we already bought the reserved instances so we’re not going to do it because of sunk cost fallacy.” Which is not much of a fallacy when it’s that kind of money, in some cases. Okay, great. Now, if it can migrate to Lambda and still wind up getting the discounts you’ve paid for it, you have removed an architectural barrier. And that’s significant.

Now, I want to see that same thing apply to oh if you move from EC2 to RDS, or DynamoDB or anything else, that should be helpful, too. But whatever you do, don’t do what SageMaker did and launch their own separate savings plan that is not compatible with the compute savings plans, so effectively, it’s great; you’re locked-in architecturally to one or the other because machine learning is, once again, a marvelously executed scam to sell pickaxes into a digital gold rush.

Dann: I mean, I like savings plans a lot and we’ve been slowly, as convertible RIs have expired, replacing them with savings plans. And I think that it is pushing the other cloud providers forward—because we’re definitely multi-cloud—and so that’s really useful and I hope more people will take on the compute savings plan type model, just because it makes our lives so much easier. Or it makes my life so much easier in terms of planning it, selling the commitment internally, just everything about it has made my life easier. So, I mean, how many years later are we? I definitely haven’t found any big gotchas, I guess, from the secondary market. But that doesn’t really impact me.

Corey: Yeah, I spent a lot of time looking forward, too, doing deep analyses of okay, for which instance classes in which regions is there a price discrepancy? And I finally got someone to go semi on record and say, “Yeah. There should not be any please ping us if you find one.” “Oh, okay, great. That is enough for me to work with.”

Dann: Exactly, we got that, too. I didn’t believe it so we were downloading price sheets and doing comparisons, doing all that stuff.

Corey: Oh, trust but verify. And when we’re talking this kind of money, I don’t trust very far. They make mistakes on billing issues from time to time. And I get it; it’s hard, but there are challenges here and there. I am glad you mentioned a minute ago that you are multi-cloud because my position on that has often been misconstrued.

I think that designing something from day one to work on multiple cloud providers is generally foolish. I think that unless you have a compelling reason not to go all-in on one cloud provider, that’s what you should do. Pick a cloud—I don’t care which—and go all-in. Conversely, you have a product like Datadog where your customers are in multiple clouds, and first, no one wants to pay egress to send all the telemetry from where they are into AWS, and secondly, they’re not going to put up, in many cases, with their data going to a cloud provider they have explicitly chosen not to work with, so you have to meet your customers where they are. In your case, it is absolutely the right thing to do. And Twitter often gets upset and calls me hypocrite on stuff like this because Twitter believes that two things that take opposite visions cannot possibly both be true, but the world is messy.

Dann: Yeah. And I mean, the nice thing about us being in multiple clouds is we are our own biggest user. And that’s actually one of the reasons why I love working at Datadog is because I get to use Datadog all the time. And not only that, Datadog is on everything and we have all of our products. I’m very spoiled [laugh] with all of this. But I mean, we are running in these different cloud providers; we are using Datadog in those different cloud providers, and that is just helping everything overall, too. In addition to supporting customers that are in each cloud because that is a huge reason as well.

Corey: This episode is sponsored in part by something new. Cloud Academy is a training platform built on two primary goals. Having the highest quality content in tech and cloud skills, and building a good community the is rich and full of IT and engineering professionals. You wouldn’t think those things go together, but sometimes they do. Its both useful for individuals and large enterprises, but here's what makes it new. I don’t use that term lightly. Cloud Academy invites you to showcase just how good your AWS skills are. For the next four weeks you’ll have a chance to prove yourself. Compete in four unique lab challenges, where they’ll be awarding more than $2000 in cash and prizes. I’m not kidding, first place is a thousand bucks. Pre-register for the first challenge now, one that I picked out myself on Amazon SNS image resizing, by visiting cloudacademy.com/corey. C-O-R-E-Y. That’s cloudacademy.com/corey. We’re gonna have some fun with this one!

Corey: One of the problems that I keep running into across the board is that with things like Datadog—and again, not to single you out; every monitoring vendor to some extent has aspects of this problem—it’s that when I’m a customer and I’m hooking my accounts up to Datadog, I want you to tell me about things that are going on, but the CloudWatch charges can be so egregious on the customer side, where it is bizarre and, frankly, abhorrent to me when I wind up paying more for the CloudWatch charges than I am for Datadog. And let’s be clear here; I am, in fact, a Datadog customer. I pay you folks money. Not a lot of money, but I pay you money because I have certain things that I need to know are working for a variety of excellent reasons.

And the problem that I keep smacking into on this is—it’s not your fault; there’s not anything you can do. In fact, you are one of the better providers as far as not only not being egregious with the way that you slam the CloudWatch endpoints, but also in giving guidance to customers on how to tune it further. And I really wish that more folks in your space would do things like that. It always bugs me when I wind up using a tool that tries to save money that in turn winds up costing me more than it saves.

Dann: Yeah. Yeah, it’s tricky there. I have less experienced myself setting up Datadog and running it in my own infrastructure as I’m more digging deep into the cost stuff and us using the cloud, so I can’t speak to that specifically. But yeah, you’re not the first person that I’ve heard have that experience. [laugh].

Corey: And again, it’s not your fault at all. I’ve been beating up the CloudWatch team for years on this, and I will continue to do so until I’m safely dead, which—depending on Amazon’s level of patience—might be in mere minutes.

Dann: In the larger-picture-wise, we have to remember that we’re super early in the cloud adoption, even looking at the cloud economics FinOps cloud cost optimization world. I feel like most businesses at this stage in their journey are still in data centers and they’re dealing with the problem of how do we move to the cloud and do it cost-efficiently? How do we set everything up? And that’s where the world is right now.

And I think that dealing with, “Okay, we are one hundred percent running in the cloud. What are the processes that we have in place? How do we think of finance and the finance organization not through the lens of ‘we once had data centers and now we don’t,’ but how do we look through that in the lens of ‘okay, we are cloud-native from day one? What does the finance department look like?’” And dealing with those problems is really interesting because Datadog has never been in a data center. We are cloud-native from the very beginning, and so it was interesting for me to join the company and build up a lot of these processes because it is different than what a lot of other people were dealing with and doing. And it presents some really interesting problems and questions that I think are going to be the foundation for the next decade of building companies and operating in the cloud.

Corey: I always love having conversations with folks who are building out teams to handle these things because usually the folks I keep talking to, or who want to have conversations like this are building tools themselves to solve this problem through the miracle of SaaS, where they will bend over backwards to avoid ever talking to a customer. And we’re all dealing with the same AWS APIs; there’s not that much of a new spin you can put on most of these things. But understanding what customers are actually trying to do instead of falling down the rabbit hole trap of, “Hey, turn off those idle instances that are all labeled ‘drsite’ because you probably don’t need them,” is foolish. And after a few foolish recommendations, tooling doesn’t get there. I am a big believer that tools can assist the process and narrow down what to look at.

I believe they shouldn’t have to exist; I think that the billing dashboard should be a hell of a lot better natively than having to pay a third party to make sense of it for me. But by and large, I do believe this is a problem that is best solved from a consultative approach. When I started this place, I was planning to build out some software, tried doing it—called DuckTools—and wound up mothballing the whole thing because what we were building was not what the industry claimed to want and, frankly, educating people into a position where then they see the value and only then will they buy is never been a game that I wanted to play.

Dann: Yeah, I really liked that article that you guys published about exploring that product and the reason why you decided not to pursue it. But it’s super interesting in terms of where the industry is going and building out those tools because I found that there isn’t really any new thing that you can do with the tools. All the tools that exist for looking at your costs are largely the same. The main differences that I’ve seen is that the UI is slightly different and they have different sales teams. And if the sales teams are better, they’re going to get more of the market share. And if the sales teams are not as good, it’s going to be a smaller market share. And it’s weird, too, be in this industry for as long as we have been, and seeing okay, well, Andreessen Horowitz just funded this new company, and this other company got invited into Y Combinator, or all of these things that are happening, and I’m kind of like, okay, but what is this tool really doing differently? And there are a few of them that are; that are doing something innovative and different, but there’s also a few that are just like, this is a space where people are in, there’s money here, we’re doing the same thing, but we got our sales team, and we’ll carve out our little corner, and then we’ll get acquired, and that’ll be that. Although I guess we’re just at that stage of innovation in this space, I guess.

Corey: Yeah, I have no earthly idea what the story is around how these companies plan to differentiate because it seems to me that they’re directly attempting to compete with Cost Explorer, which—

Dann: Yeah.

Corey: —it’s taken some time for that thing to improve to the point where it is now and it’ll take further time for it to improve beyond it, but long-term, I don’t think you’re going to outrun AWS on a straight line like that.

Dann: Yeah, I mean, when you work for one of these third-party cost tooling things, and you’re working with one of your customers, and they’re like, “How do I view this?” And it’s kind of like, that is the easiest thing to find in Cost Explorer as well, it’s—I can’t imagine being like,

“Well, you should pay me thousands, tens of thousands, hundreds of thousands of dollars a month to view it here,” when Cost Explorer is free. And I think Cost Explorer, it doesn’t do everything, but it’s gotten a lot better at what it does, and it could probably solve 90% of people’s

problems without using a third-party tool.

Corey: You are at significant scale in multiple clouds, so the answer that these companies always give is, “Ah, but we provide a single

dashboard so that you can look at costs across multiple providers in one place.” Is that even slightly useful to you?

Dann: Man, if you need dashboards, get a dashboard tool. Don’t get this crazy cost analysis tool. I mean, there are some great dashboard solutions that you can get where you can connect your detailed billing, cost and usage report—whatever cloud provider is calling it, but, like, that really detailed gigabytes per hour report—and then visualize it, build reports, do all that kind of stuff because that’s not something that the tooling does well right now, in terms of building out cost dashboards and stuff. But that’s also right now. It could in the future.

Corey: Yeah. If you’re a BI tool, wind up passing out templates that normalize these things? I am so tired of building it all from scratch in Tableau myself. If you’re Tableau, sell me a whole bunch of things that I can use to view this stuff through, so I don’t have to wind up continually reinventing that particular wheel.

Dann: Yeah.

Corey: Oh, I like your approach. I didn’t know the answer when I was asking the question. I was about to learn something if you’d gone the other direction, but nope, but it’s good to know that my impressions remain intact.

Dann: Yeah, I mean, I’ve used different tools in the past. Again, I hesitate to name any of them, but there’s a few in this space that I feel like everybody—if they’re in this space, they know which tools I’m talking about—

Corey: Yes, we do.

Dann: —and… yeah, I’ve used them. They’re okay—a few of them are okay, a few of them are better than others, but I mean, I was trying to

evaluate the value-add over me manually setting some things up and having some sort of visualization, and just the value-add in terms of what they were charging, even if it was like a significantly smaller percent of the bill because that alone, like, percent of bill is such a difficult cost model—

Corey: Oh—

Dann: —to do.

Corey: I hate that. Pricing is hard. Let’s start there.

Dann: Yeah. Yeah. Yeah, yeah.

Corey: I hate the percent of bill because then it’s, “Let me get this straight. I’m paying you a percentage of things like data transfer charges that I know are fixed, that I can’t optimize? I’m paying you a percentage of my AWS enterprise support subscription? I’m paying you a percentage of the marketplace?” And so on and so forth. And it doesn’t work. At some point of scale as well it’s, I could hire a team of 20 people and save money versus what you’re charging me. The other side of it though, “Ah, we’ll charge you percentage of savings.” Well, then you wind up with people doing a whole bunch of things like before they bring you in, they’ll make a bunch of ill-advised reserved instance purchases or savings plan purchases you have to then unwind after the fact. When I was setting this place up, I looked long and hard at different billing models and the only thing I found that worked is fixed fee. The end. Because at that point, suddenly everyone’s on board with, “Hey, let’s solve the problem and then get out as soon as possible.” We’re not trying to build ourselves a forever job nestled in the heart of your company. And it’s the only model I found that removes a whole swath of conflicts of interest. And that’s the hard part. We have no partners with anyone in this space—including AWS themselves—just because as soon as we do, it becomes extremely disingenuous when we suggest doing something for your sake that happens to benefit them, such as, “Maybe back that S3 bucket up somewhere.” Well, okay, if we’re partnered with them, does that mean we’re trying to influence spend in the other direction? And it just becomes a morass that I never found it worth the time to deal with.

Dann: Yeah, I—

Corey: But that doesn’t work for SaaS.

Dann: Yeah, that makes a lot of sense. And I haven’t actually thought about pricing model for consulting in this space that closely, but I mean, when you’re charging a percent of bill or percent of savings, you have the opportunity to screw the customer, right, through all the things that you were saying. If you charge a fixed fee, you have the possibility of undervaluing yourself, which the only one that’s screwed in that case is you, potentially, and if you’re okay with that risk and you’re okay with those dollars, that’s great. Because yeah, if you’re able to be like, “Okay, here’s the services that I do, here’s the fixed costs.” “Done.” “Done.” That just sets everybody’s expectations for the relationship in a much better way that you’re not constantly worried about, like, upsells and other things that might happen along the way that screws the customer.

Corey: And that’s the hardest part, I think, is that people lose sight of the entire customer obsession piece of it. That’s one of the things Amazon gets super right. I wish more companies embrace that. Dann, I want to thank you for taking so much time out of your day to suffer my slings, and arrows, and half-formed opinions. If people want to learn more about who you are and what you’re up to, where can they find you?

Dann: Yeah, I have a website you guys can go to that links everywhere else. It is dannb.org. And I spell my name with two ns, so D-A-N-N-B dot org. And I have LinkedIn, I have Twitter, I have a monthly newsletter that is not really about FinOps or anything, but I really enjoy it; I’ve been doing it for a year, now, that you should sign up for.

Corey: And links to that will, of course, be in the [show notes 00:36:26]. Dann, thanks again for your time. I really appreciate it.

Dann: Yeah. Thanks so much for having me again. It’s been a blast.

Corey: It really has. Dann Berg, senior CloudOps analyst at Datadog. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment featuring a picture of several corkboards full of post-it notes and string, and a deranged comment telling me that you have in fact finally found the catch in savings plans.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

That Datadog Will Hunt with Dann Berg

Episode Summary

Episode Show Notes & Transcript

Transcript

You might also like

When AI Starts Writing the Pull Requests with Madelyn Olson

The Appalachian Cloud Trail: Hiking, Cloud Economics, and Finding Perspective

Coding Agents, Chaos, and the Future of Dev Work with Dexter Horthy

Get the Newsletter

Gnarly cloud cost questions?