Data Center War Stories with Mike Julian

Episode Summary

Mike Julian is the CEO of The Duckbill Group, a company you might be familiar with. Prior to co-founding Duckbill with yours truly, Mike was editor in chief at Monitoring Weekly, principal at Aster Labs, a senior DevOps consultant at Taos, a senior systems engineer at Peak Hosting, and an operations engineer at Oak Ridge National Laboratory, among other positions. He’s also the author of Practical Monitoring: Effective Strategies for the Real World. Join Corey and Mike as they assess the current state of data centers and talk about how data centers are on their way out even if they’ll still be around for the foreseeable future, what it was like working at Oak Ridge, how Mike describes the two different kinds of data centers he’s encountered, the client that set up their infrastructure in a basement of a boat (below the waterline), why you never want to forget your jacket en route to the data center, why you should cut cables when you throw them away, why data centers need 180 days of lead time, and more.

Episode Show Notes & Transcript

About Mike

Beside his duties as The Duckbill Group’s CEO, Mike is the author of O’Reilly’s Practical Monitoring, and previously wrote the Monitoring Weekly newsletter and hosted the Real World DevOps podcast. He was previously a DevOps Engineer for companies such as Taos Consulting, Peak Hosting, Oak Ridge National Laboratory, and many more. Mike is originally from Knoxville, TN (Go Vols!) and currently resides in Portland, OR.

Links:

Software Engineering Daily podcast: https://softwareengineeringdaily.com/category/all-episodes/exclusive-content/Podcast/
Duckbillgroup.com: https://duckbillgroup.com

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.

Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of
them go away. To learn more, visit lumigo.io.

Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I spent the past week guest hosting the Software Engineering Daily podcast, taking listeners over there on a tour of the clouds. Each day, I picked a different cloud and had a guest talk to me about their experiences with that cloud.

Now, there was one that we didn’t talk about, and we’re finishing up that tour here today on Screaming in the Cloud. That cloud is the obvious one, and that is your own crappy data center. And my guest is Duckbill Group’s CEO and my business partner, Mike Julian. Mike, thanks for joining me.

Mike: Hi, Corey. Thanks for having me back.

Corey: So, I frequently say that I started my career as a grumpy Unix sysadmin. Because it isn’t like there’s a second kind of Unix sysadmin you’re going to see. And you were in that same boat. You and I both have extensive experience working in data centers. And it’s easy sitting here on the tech coast of the United States—we’re each in tech hubs cities—and we look around and yeah, the customers we talked to have massive cloud presences; everything we do is in cloud, it’s easy to fall into the trap of believing that data centers are a thing of yesteryear. Are they?

Mike: [laugh]. Absolutely not. I mean, our own customers have tons of stuff in data centers. There are still companies out there like Equinix, and CoreSite, and DRC—is that them? I forget the name of them.

Corey: DRT. Digital Realty [unintelligible 00:01:54].

Mike: Digital Realty. Yeah. These are companies still making money hand over fist. People are still putting new workloads into data centers, so yeah, we’re kind of stuck with him for a while.

Corey: What’s fun is when I talked to my friends over in the data center sales part of the world, I have to admit, I went into those conversations early on with more than my own fair share of arrogance. And it was, “[laugh]. So, who are you selling to these days?” And the answer was, “Everyone, fool.” Because they are.

People at large companies with existing data center footprints are not generally doing fire sales of their data centers, and one thing that we learned about cloud bills here at The Duckbill Group is that they only ever tend to go up with time. That’s going to be the case when we start talking about data centers as well. The difference there is that it’s not just an API call away to lease more space, put in some racks, buy some servers, get them racked. So, my question for you is, if we sit here and do the Hacker News—also known as the worst website on the internet—and take their first principles approach to everything, does that mean the people who are building out data centers are somehow doing it wrong? Did they miss a transformation somewhere?

Mike: No, I don’t think they’re doing it wrong. I think there’s still a lot of value in having data centers and having that sort of skill set. I do think the future is in cloud infrastructure, though. And whether that’s a public cloud, or private cloud, or something like that, I think we’re getting increasingly away from building on top of bare metal, just because it’s so inefficient to do. So yeah, I think at some point—and I feel like we’ve been saying this for years that, “Oh, no, everyone’s missed the boat,” and here we are saying it yet again, like, “Oh, no. Everyone’s missing the boat.” You know, at some point, the boat’s going to frickin’ leave.

Corey: From my perspective, there are advantages to data centers. And we can go through those to some degree, but let’s start at the beginning. Origin stories are always useful. What’s your experience working in data centers?

Mike: [laugh]. Oh, boy. Most of my career has been in data centers. And in fact, one interesting tidbit is that, despite running a company that is built on AWS consulting, I didn’t start using AWS myself until 2015. So, as of this recording, it’s 2021 now, so that means six years ago is when I first started AWS.

And before that, it was all in data centers. So, some of my most interesting stuff in the data center world was from Oak Ridge National Lab where we had hundreds of thousands of square feet of data center floor space across, like, three floors. And it was insane, just the amount of data center stuff going on there. A whole bunch of HPC, a whole bunch of just random racks of bullshit. So, it’s pretty interesting stuff.

I think probably the most really interesting bit I’ve worked on was when I was at a now-defunct company, Peak Hosting, where we had to figure out how to spin up a data center without having anyone at the data center, as in, there was no one there to do the spin up. And that led into interesting problems, like you have multiple racks of equipment, like, thousands of servers just showed up on the loading dock. Someone’s got to rack them, but from that point, it all has to be automatic. So, how do you bootstrap entire racks of systems from nothing with no one physically there to start a bootstrap process? And that led us to build some just truly horrific stuff. And thank God that’s someone else’s problem, now. [laugh].

Corey: It makes you wonder if under the hood at all these cloud providers if they have something that’s a lot cleaner, and more efficient, and perfect, or if it’s a whole bunch of Perl tied together with bash and hope, like we always built.

Mike: You know what? I have to imagine that even at AWS at a—I know if this is true at Facebook, where they have a massive data center footprint as well—there is a lot of work that goes into the bootstrap process, and a lot of these companies are building their own hardware to facilitate making that bootstrap process easier. When you’re trying to bootstrap, say, like, Dell or HP servers, the management cards only take you so far. And a lot of the stuff that we had to do was working around bugs in the HP management cards, or the Dell DRACs.

Corey: Or you can wind up going with some budget whitebox service. I mean, Supermicro is popular, not that they’re ultra-low budget. But yeah, you can effectively build your own. And that leads down interesting paths, too. I feel like there’s a sweet spot where working on a data center and doing a build-out makes sense for certain companies.

If you’re trying to build out some proof of concept, yeah, do it in the cloud; you don’t have to wait eight weeks and spend thousands of dollars; you can prove it out right now and spend a total of something like 17 cents to figure out if it’s going to work or not. And if it does, then proceed from there, if not shut it down, and here’s a quarter; keep the change. With data centers, a lot more planning winds up being involved. And is there a cutover at which point it makes sense to evacuate from a public cloud into a physical data center?

Mike: You know, I don’t really think so. This came up on a recent Twitter Spaces that you and I did around, at what point does it really make sense to be hybrid, or to be all-in on data center? I made the argument that a large-scale HPC does not fit cloud workloads, and someone made a comment that, like, “What is large-scale?” And to me, large-scale was always, like—so Oak Ridge was—or is famous—for having supercomputing, and they have largely been in the top five supercomputers in the world for quite some time. A supercomputer of that size is tens of thousands of cores. And they’re running pretty much constant because of how expensive that stuff is to get time on. And that sort of thing would be just astronomically expensive in a cloud. But how many of those are there really?

Corey: Yeah, if you’re an AWS account manager listening to this and reaching out with, “No, that’s not true. After committed spend, we’ll wind up giving you significant discounts, and a whole bunch of credits, and jump through all these hoops.” And, yeah, I know, you’ll give me a bunch of short-term contractual stuff that’s bounded for a number of years, but there’s no guarantee that stuff gets renewed at that rate. And let’s face it. If you’re running those kinds of workloads today, and already have the staff and tooling and processes that embrace that, maybe ripping all that out in a cloud migration where there’s no clear business value derived isn’t the best plan.

Mike: Right. So, while there is a lot of large-scale HPC infrastructure that I don’t think particularly fits well on the cloud, there’s not a lot of that. There’s just not that many massive HPC deployments out there. Which means that pretty much everything below that threshold could be a candidate for cloud workloads, and probably would be much better. One of the things that I noticed at Oak Ridge was that we had a whole bunch of SGI HPC systems laying around, and 90% of the time they were idle.

And those things were not cheap when they were bought, and at the time, they’re basically worth nothing. But they were idle most of the time, but when they were needed, they’re there, and they do a great job of it. With AWS and GCP and Azure HPC offerings, that’s a pretty good fit. Just migrate that whole thing over because it’ll cost you less than buying a new one. But if I’m going to migrate Titan or Gaia from Oak Ridge over to there, yeah, some AWS rep is about to have a very nice field day. That’d just be too much money.

Corey: Well, I’d be remiss as a cloud economist if I didn’t point out that you can do this stuff super efficiently in someone else’s AWS
account.

Mike: [laugh]. Yes.

Corey: There’s also the staffing question where if you’re a large blue-chip company, you’ve been around for enough decades that you tend to have some revenue to risk, where you have existing processes and everything is existing in an on-prem environment, as much as we love to tell stories about the cloud being awesome, and the capability increase and the rest, yadda, yadda, yadda, there has to be a business case behind moving to the cloud, and it will knock some nebulous percentage off of your TCO—because lies, damned lies, and TCO analyses are sort of the way of the world—great. That’s not exciting to most strategic-level execs. At least as I see the world. Given you are one of those strategic level execs, do you agree? Am I lacking nuance here?

Mike: No, I pretty much agree. Doing a data center migration, you got to have a reason to do it. We have a lot of clients that are still running in data centers as well, and they don’t move because the math doesn’t make sense. And even when you start factoring in all the gains from productivity that they might get—and I stress the word might here—even when you factor those in, even when you factor in all the support and credits that Amazon might give them, it still doesn’t make enough sense. So, they’re still in data centers because that’s where they should be for the time because that’s what the finances say. And I’m kind of hard-pressed to disagree with them.

Corey: While we’re here playing ‘ask an exec,’ I’m going to go for another one here. It’s my belief that any cloud provider that charges a penny for professional services, or managed services, or any form of migration tooling or offering at all to their customers is missing the plot. Clearly, since they all tend to do this, I’m wrong somewhere. But I don’t see how am I wrong or are they?

Mike: Yeah, I don’t know. I’d have to think about that one some more.

Corey: It’s an interesting point because it’s—

Mike: It is.

Corey: —it’s easy to think of this as, “Oh, yeah. You should absolutely pay people to migrate in because the whole point of cloud is that it’s kind of sticky.” The biggest indicator of a big cloud bill this month is a slightly smaller one last month. And once people wind up migrating into a cloud, they tend not to leave despite all of their protestations to the contrary about multi-cloud, hybrid, et cetera, et cetera. And that becomes an interesting problem.

It becomes an area—there’s a whole bunch of vendors that are very deeply niched into that. It’s clear that the industry as a whole thinks that migrating from data centers to cloud is going to be a boom industry for the next three decades. I don’t think they’re wrong.

Mike: Yeah, I don’t think they’re wrong either. I think there’s a very long tail of companies with massive footprint staying in a data center that at some point is going to get out of a data center.

Corey: For those listeners who are fortunate enough not to have to come up the way that we did. Can you describe what a data center is like inside?

Mike: Oh, God.

Corey: What is a data center? People have these mythic ideas from television and movies, and I don’t know, maybe some Backstreet Boys music video; I don’t know where it all comes from. What is a data center like? What does it do?

Mike: I’ve been in many of these over my life, and I think they really fall into two groups. One is the one managed by a professional data center manager. And those tend to be sterile environments. Like, that’s the best way to describe it. They are white, filled with black racks. Everything is absolutely immaculate. There is no trash or other debris on the floor. Everything is just perfect. And it is freezingly cold.

Corey: Oh, yeah. So, you’re in a data center for any length of time, bring a jacket. And the soulless part of it, too, is that it’s well-lit with
fluorescent lights everywhere—

Mike: Oh yeah.

Corey: —and it’s never blinking, never changing. There are no windows. Time loses all meaning. And it’s strange to think about this because you don’t walk in and think, “What is that racket?” But there’s 10,000, 100,000 however many fans spinning all the time. It is super loud. It can clear 120 decibels in there, but it’s a white noise so you don’t necessarily hear it. Hearing protection is important there.

Mike: When I was at Oak Ridge, we had—all of our data centers, we had a professional data center manager, so everything was absolutely pristine. And to get into any of the data centers, you had to go through a training; it was very simple training, but just, like, “These are things you do and don’t do in the data center.” And when you walked in, you had to put in earplugs immediately before you walked in the door. And it’s so loud just because of that, and you don’t really notice it because you can walk in without earplugs and, like, “Oh, it’s loud, but it’s fine.” And then you leave a couple hours later and your ears are ringing. So, it’s a weird experience.

Corey: It’s awful. I started wearing earplugs every time I went in, just because it’s not just the pain because hearing loss doesn’t always manifest that way. It’s, I would get tired much more quickly.

Mike: Oh, yeah.

Corey: I would not be as sharp. It was, “What is this? Why am I so fatigued?” It’s noise.

Mike: Yeah. And having to remember to grab your jacket when you head down to the data center, even though it’s 95 degrees outside.

Corey: At some point, if you’re there enough—which you probably shouldn’t be—you start looking at ways to wind up storing one locally. I feel like there could be some company that makes an absolute killing by renting out parkas at data centers.

Mike: Yeah, totally. The other group of data center stuff that I generally run into is the exact opposite of that. And it’s basically someone has shoved a couple racks in somewhere and they just kind of hope for the best.

Corey: The basement. The closet. The hold of a boat, with one particular client we work with.

Mike: Yeah. That was an interesting one. So, we had a—Corey and I had a client where they had all their infrastructure in the basement of a boat. And we’re [laugh] not even kidding. It’s literally in the basement of a boat.

Corey: Below the waterline.

Mike: Yeah below the waterline. So, there was a lot of planning around, like, what if the hold gets breached? And like, who has to plan for that sort of thing? [laugh]. It was a weird experience.

Corey: It turns out that was—was hilarious about that was while they were doing their cloud migration into AWS, their account manager wasn’t the most senior account manager because, at that point, it was a small account, but they still stuck to their standard talking points about TCO, and better durability, and the rest, and it didn’t really occur to them to come back with a, what if the boat sinks? Which is the obvious reason to move out of that quote-unquote, “data center?”

Mike: Yeah. It was a wild experience. So, that latter group of just everything’s an absolute wreck, like, everything—it’s just so much of a pain to work with, and you find yourself wanting to clean it up. Like, install new racks, do new cabling, put in a totally new floor so you’re not standing on concrete. You want to do all this work to it, and then you realize that you’re just putting lipstick on a pig; it’s still going to be a dirty old data center at the end of the day, no matter how much work you do to it. And you’re still running on the same crappy hardware you had, you’re still running on the same frustrating deployment process you’ve been working on, and everything still sucks, despite it looking good.

Corey: The worst part is playing the ‘what is different here?’ Game. You rack twelve servers: eleven come up fine and the twelfth doesn’t.

Mike: [laugh].

Corey: It sounds like, okay, how hard could it be? Days. It can take days. In a cloud environment, you have one weird instance. Cool, you terminate it and start a new one and life goes on whereas, in a data center, you generally can’t send back a $5,000 piece of hardware willy nilly, and you certainly can’t do it same-day, so let’s figure out what the problem is.

Is that some sub-component in the system? Is it a dodgy cable? Is it, potentially, a dodgy switch port? Is there something going on with that node? Was there something weird about the way the install was done if you reimage the thing? Et cetera, et cetera. And it leads
down rabbit holes super quickly.

Mike: People that grew up in the era of computing that Corey and I did, you start learning tips and tricks, and they sound kind of silly these days, but things like, you never create your own cables. Even though both of us still remember how to wire a Cat 5 cable, we don’t.

Corey: My fingers started throbbing when you said that because some memories never fade.

Mike: Right. You don’t. Like, if you’re working in a data center, you’re buying premade cables because they’ve been tested professionally by high-end machines.

Corey: And you still don’t trust it. You have a relatively inexpensive cable tester in the data center, and when—I learned this when I was racking stuff the second time, it adds a bit of time, but every cable that we took out of the packaging before we plugged it in, and we tested on the cable tester just to remove that problem. And it still doesn’t catch everything because, welcome to the world of intermittent cables that are marginal that, when you bend a certain way, stop working, and then when you look at them, start working again properly. Yes, it’s as maddening as it sounds.

Mike: Yeah. And then things like rack nuts. My fingers hurt just thinking about it.

Corey: Think of them as nuts that bolts wind up screwing into but they’re square and they have clips on them so they clip into the standard rack cabinets, so you can screw equipment into them. There are different sizes of them, and of course, they’re not compatible with one another. And you have—they always pinch your finger and make you bleed because they’re incredibly annoying to put in and out. Some vendors have quick rails, which are way nicer, but networking equipment is still stuck in the ‘90s in that context, and there’s always something that winds up causing problems.

Mike: If you were particularly lucky, the rack nuts that you had were pliable enough that you could pinch them and pull them out with your fingers, and hopefully didn’t do too much damage. If you were particularly unlucky, you had to reach for a screwdriver to try to pry it out, and inevitably stab yourself.

Corey: Or sometimes pulling it out with your fingers, it’ll—like, those edges are sharp. It’s not the most high-quality steel in some cases, and it’s just you wind up having these problems. Oh, one other thing you learn super quickly, is first, always have a set of tools there
because the one you need is the one you don’t have, and the most valuable tool you’ll have is a pair of wire cutters. And what you do when you find a bad cable is you cut it before throwing it away.

Mike: Yep.

Corey: Because otherwise someone who is very well-meaning but you will think of them as the freaking devil, will, “Oh, there’s a perfectly good cable sitting here in the trash. I’ll put it back with the spares.” So you think you have a failed cable you grab another one from the pile of spares—remember, this is two in the morning, invariably, and you’re not thinking on all cylinders—and the problem is still there. Cut the cable when you throw it away.

Mike: So, there are entire books that were written about these sorts of tips and tricks that everyone working [with 00:19:34] data center just remembers. They learned it all. And most of the stuff is completely moot now. Like, no one really thinks about it anymore. Some people are brought up in computing in such a way that they never even learned these things, which I think it’s fantastic.

Corey: Oh, I don’t wish this on anyone. This used to be a prerequisite skill for anyone who called themselves a systems administrator, but I am astonished when I talk to my AWS friends, the remarkably senior engineers I talk to who have never been inside of an AWS data center.

Mike: Yeah, absolutely.

Corey: That’s really cool. It also means you’re completely divorced from the thing you’re doing with code and the rest, and the thing that winds up keeping the hardware going. It also leads to a bit of a dichotomy where the people racking the hardware, in many cases, don’t understand the workloads that are on there because if you have the programming insight, and ability, and can make those applications work effectively, you’re probably going to go find a role that compensates far better than working in the data center.

Mike: I [laugh] want to talk about supply chains. So, when you build a data center, you start planning about—let’s say, I’m not Amazon. I’m just, like, any random company—and I want to put my stuff into a data center. If I’m going to lease someone else’s data center—which you absolutely should—we’re looking at about a 180-day lead time. And it’s like, why? Like, that’s a long time. What’s—

Corey: It takes that long to sign a real estate lease?

Mike: Yeah.

Corey: No. It takes that long to sign a real estate lease, wind up talking to your upstream provider, getting them to go ahead and run the thing—effectively—getting the hardware ordered and shipped in the right time window, doing the actual build-out once everything is in place, and I’m sure a few other things I’m missing.

Mike: Yeah, absolutely. So yeah, you have all these things that have to happen, and all of them pay for-freaking-ever. Getting Windstream on the phone to begin with, to even take your call, can often take weeks at a time. And then to get them to actually put an order for you, and then do the turnup. The turnup alone might be 90 days, where I’m just, “Hey, I’ve bought bandwidth from you, and I just need you to come out and connect the [BLEEP] cables,” might be 90 days for them to do it.

And that’s ridiculous. But then you also have the hardware vendors. If you’re ordering hardware from Dell, and you’re like, “Hey, I need a couple servers.” Like, “Great. They’ll be there next week.” Instead, if you’re saying, “Hey, I need 500 servers,” they’re like, “Ooh, uh, next year, maybe.” And this is even pre-pandemic sort of thing because they don’t have all these sitting around.

So, for you to get a large number of servers quickly, it’s just not a thing that’s possible. So, a lot of companies would have to buy well ahead of what they thought their needs would be, so they’d have massive amounts of unused capacity. Just racks upon racks of systems sitting there turned off, waiting for when they’re needed, just because of the ordering lead time.

Corey: That’s what auto-scaling looks like in those environments because you need to have that stuff ready to go. If you have a sudden inrush of demand, you have to be able to scale up with things that are already racked, provisioned, and good to go. Sometimes you can have them halfway provisioned because you don’t know what kind of system they’re going to need to be in many cases, but that’s some up-the-stack level thinking. And again, finding failed hard drives and swapping those out, make sure you pull the right or you just destroyed an array. And all these things that I just make Amazon’s problem.

It’s kind of fun to look back at this and realize that we would get annoyed then with support tickets that took three weeks to get resolved in hardware, whereas now three hours in you and I are complaining about the slow responsiveness of the cloud vendor.

Mike: Yeah, the amount of quick turnaround that we can have these days on cloud infrastructure that was just unthinkable, running in data centers. We don’t run out of bandwidth now. Like, that’s just not a concern that anyone has. But when you’re running in a data center, and, “Oh, yeah. I’ve got an OC-3 line connected here. That’s only going to get me”—

Corey: Which is something like—what is an OC-3? That’s something like, what, 20 gigabit, or—

Mike: Yeah, something like that. It’s—

Corey: Don’t quote me on that.

Mike: Yeah. So, we’re going to have to look that up. So, it’s equivalent to a T-3, so I think that’s a 45 megabit?

Corey: Yeah, that sounds about reasonable, yeah.

Mike: So, you’ve got a T-3 line sitting here in your data center. Like that’s not terrible. And if you start maxing that out, well, you’re maxed out. You need more? Again, we’re back to the 90 to 180 day lead time to get new bandwidth.

So, sucks to be you, which means you’d have to start planning your bandwidth ahead of time. And this is why we had issues like companies getting Slashdotted back in the day because when you capped the bandwidth out, well, you’re capped out. That’s it. That’s the game.

Corey: Now, you’ve made the front page of Slashdot, a bunch of people visited your site, and the site fell over. That was sort of the way of the world. CDNs weren’t really a thing. Cloud wasn’t a thing. And that was just, okay, you’d bookmark the thing and try and remember to check it later.

We talked about bandwidth constraints. One thing that I think the cloud providers do—at least the tier ones—that are just basically magic is full line rate between any two instances almost always. Well, remember, you have a bunch of different racks, and at the top of every rack, there’s usually a switch called—because we’re bad at naming things—top-of-rack switches. And just because everything that you have plugged in can get one gigabit to that switch—or 10 gigabit or whatever it happens to be—there is a constraint in that top-of-rack switch. So yeah, one server can talk to another one in a different rack at one gigabit, but then you have 20 different servers in each rack all trying to do something like that and you start hitting constraints.

You do not see that in the public cloud environments; it is subsumed away, you don’t have to think about that level of nonsense. You just complain about what feels like the egregious data transfer charge.

Mike: Right. Yeah. It was always frustrating when you had to order nice high-end switching gear from Cisco, or Arista, or take your pick of provider, and you got 48 ports in the top-of-rack, you got 48 servers all wired up to them—or 24 because we want redundancy on that—and that should be a gigabit for each connection, except when you start maxing it out, no, it’s nowhere even near that because the switch can’t handle it. And it’s absolutely magical, that the cloud provider’s like, “Oh, yeah. Of course, we handle that.”

Corey: And you don’t have to think about it at all. One other use case that I did want to hit because I know we’ll get letters if we don’t, where it does make sense to build out a data center, even today, is if you have regulatory requirements around data residency. And there’s no cloud vendor in an area that suits. This generally does not apply to the United States, but there are a lot of countries that have data residency laws that do not yet have a cloud provider of their choice region, located in-country.

Mike: Yeah, I’ll agree with that, but I think that’s a short-lived problem.

Corey: In the fullness of time, there’ll be regions everywhere. Every build—a chicken in every pot and an AWS availability zone on every corner.

Mike: [laugh]. Yeah, I think it’s going to be a fairly short-lived problem, which actually reminds me of even our clients that have data centers are often treating the data center as a cloud. So, a lot of them are using your favorite technology, Corey, Kubernetes, and they’re treating Kubernetes as a cloud, running Kube in AWS, as well, and moving workloads between the two Kube clusters. And to them, a data center is actually not really data center; it’s just a private cloud. I think that pattern works really well if you have a need to have a physical data center.

Corey: And then they start doing a hybrid environment where they start expanding to a public cloud, but then they treat that cloud like just a place to run a bunch of VMs, which is expensive, and it solves a whole host of problems that we’ve already talked about. Like, we’re bad at replacing hard drives, or our data center is located on a corner where people love to get drunk on the weekends and smash into the power pole and take out half of the racks here. Things like that great, yeah, cloud can solve that, but cloud could do a lot more. You’re effectively worsening your cloud experience to improve your data center experience.

Mike: Right. So, even when you have that approach, the piece of feedback that we give the client was, you have built such a thing where you have to cater to the lowest common denominator, which is the constraints that you have in the data center, which means you’re not able to use AWS the way that you should be able to use it so it’s just as expensive to run as a data center was. If they were to get rid of the data center, then the cloud would actually become cheaper for them and they would get more benefits from using it. So, that’s kind of a business decision for how they’ve structured it, and I can’t really fault them for it, but there are definitely some downsides to the approach.

Corey: Mike, thank you so much for joining me here. If people want to learn more about what you’re up to, where can they find you?

Mike: You know, you can find me at duckbillgroup.com, and actually, you can also find Corey at duckbillgroup.com. We help companies lower their AWS bills. So, if you have a horrifying bill, you should chat.

Corey: Mike, thank you so much for taking the time to join me here.

Mike: Thanks for having me.

Corey: Mike Julian, CEO of The Duckbill Group and my business partner. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and then challenge me to a cable-making competition.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Episode Summary

Episode Show Notes & Transcript

You might also like

Reliable Software by Default with Jeremy Edberg

See Why GenAI Workloads Are Breaking Observability with Wayne Segar

Presenting at re:Invent with Matt Berk and Bowen Wang

Get the Newsletter

Sponsor an Episode