Innovations and the Changing DevOps Tides of Tech with Nigel Kersten

Episode Summary

This week Nigel Kersten, Field CTO at Puppet, joins Corey to talk about their respective companies relationship and what all it entails. They rehash Corey’s time spent as a traveling contract trainer for Puppet and the ins and outs of that time. To include the challenges of describing what exactly Puppet is and how it works to clients. They also dive into the differences between then and now on DevOps, and tech at large. In short, Puppet is a DSL (domain specific language). Nigel and Corey divulge the details on what that is, how it works, and how to translate it over to a larger, not so technical, world. They also reflect on how Docker handed over the keys and some of the attachments we have to a techno-social system. Nigel speaks on the innovations that have changed along the way and the impact they’ve had in the industry. Especially those that have a tendency to cling to “legacy.”

Episode Show Notes & Transcript

About Nigel

Nigel Kersten’s day job is Field CTO at Puppet where he leads a group of engineers who work with Puppet’s largest customers on cultural and organizational changes necessary for large-scale DevOps implementations - among other things. He’s a co-author of the industry-leading State Of DevOps Report and likes to evenly talk about what went right with DevOps and what went wrong based on this research and his experience in the field. He’s held multiple positions at Puppet across product and engineering and came to Puppet from the Google SRE organization, where he was responsible for one of the largest Puppet deployments in the world. Nigel is passionate about behavioral economics, electronic music, synthesizers, and Test cricket. Ask him about late-stage capitalism, and shoes.

Links:

Puppet: https://puppet.com
2020 State of DevOps Report: https://puppet.com/resources/report/2020-state-of-devops-report/

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.

Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. This promoted episode is sponsored by a long time… I wouldn’t even say friends so much as antagonist slash protagonist slash symbiotic company with things I have done as I have staggered through the ecosystem. There’s a lot of fingers of blame that I can point throughout the course of my career at different instances, different companies, different clients, et cetera, et cetera, that have shaped me into the monstrosity than I am today. But far and away, the company that has the most impact on the way that I speak publicly, is Puppet.

Here to accept the recrimination for what I become and how it’s played out is Nigel Kersten, a field CTO at Puppet—or the field CTO; I don’t know how many of them they have. Nigel, welcome to the show, and how unique are you?

Nigel: Thank you, Corey. Well, I—you know, reasonably unique. I think that you get used to being one of the few Australians living in Portland who’s decided to move away from the sunny beaches and live in the gray wilderness of the Pacific Northwest.

Corey: So, to give a little context into that ridiculous intro, I was a traveling contract trainer for the Puppet fundamentals course for an entire summer back in I want to say 2014, but don’t hold me to it. And it turns out that when you’re teaching a whole bunch of students who have paid in many cases, a couple thousand dollars out of pocket to learn a new software where, in some cases, they feel like it’s taking their job away because they view their job, rightly or wrongly, is writing the same script again and again. And then the demo breaks and people are angry, and if you don’t get a good enough rating, you’re not invited to continue, and then the company you’re contracting through hits you with a stick, it teaches you to improvise super quickly. So, I wasn’t kidding when I said that Puppet was in many ways responsible for the way that I give talks now. So, what do you have to say for yourself?

Nigel: Well, I have to say, congratulations for surviving, opinionated defensive nerds who think not only you but your entire product you’re demoing could be replaced by a shell script. It’s a tough crowd.

Corey: It was an experience. And some of these were community-based, and some of them were internal to a specific company. And if people have heard more than one episode of this show, I’m sure they can imagine how that went. I gave a training at Comcast once and set a personal challenge for myself of how many times could I use the word ‘comcastic’ in a three-day training. And I would work it in and talk about things like the schedule parameter in Puppet where it doesn’t guarantee something’s going to execute in a time window; it’s the only time it may happen.

If it doesn’t fire off, and then it isn’t going to happen. It’s like a Comcast service appointment. And then they just all kind of stared at me for a while and, credit where due, that was the best user rating I ever got from people sitting through one of my training. So, thanks for teaching me how to improve at, basically, could have been a very expensive mistake on Puppet’s part. It accidentally worked out for everyone.

Nigel: Brilliant, brilliant. Yes, you would have survived teaching the spaceship operator to that sort of a crowd.

Corey: Oh, I mostly avoided that thing. That was an advanced Puppet-ism, and this was Puppet fundamentals because I just need to be topically good at things, not deep-dive good at things. But let’s dig into that a little bit. For those who have not had the pleasure of working with Puppet, what is it?

Nigel: Sure? So, Puppet is a pretty simple DSL. You know, DSLs aren’t necessarily in favor these days.

Corey: Domain-specific language, for those who have not—

Nigel: Yep.

Corey: —caught up on that acronym. Yes.

Nigel: So, a programming language designed for a specific task. And, you know, instead, we’ve decided that the world will rest on YAML. And we’ve absorbed a fair bit of YAML into our ecosystem, but there are things that I will still stand by are just better to do in a programming language. ‘if x then y,’ for example, it’s just easier to express when you have actual syntax around you and you’re not, sort of, forcing everything to be in a data specification language. So, Puppet’s pretty simple in that it’s a language that lets you describe the state that infrastructure should be.

And you can do this in a modular and composable way. So, I can build a little chunk of automation code; hand it to Corey; Corey can build something slightly bigger with it; hand it to someone else. And really, this sort of collaboration is one of the reasons why Puppet’s, sort of, being at the center of the DevOps movement, which at its core is not really about tools. It’s about reducing friction between different groups.

Corey: Back when I was doing my traveling training shtick, I found that I had to figure out a way to describe what Puppet did to folks who were not deep in the space, and the analogy that I came up with that I was particularly partial to was, imagine you get a brand new laptop. Well, what do you do with it? You install your user account and go through the setup; you install the programs that you use, some which have licenses on it; you copy your data onto it; you make sure that certain programs always run on startup because that’s the way that you work with these things; you install Firefox because that’s the browser of choice that you go with, et cetera, et cetera. Now, imagine having to do that for, instead of one computer, a thousand of them, and instead of a laptop, they’re servers. And that is directionally what Puppet does.

Nigel: Absolutely. This is the one I use for my mother as well. Like, I was working around Puppet for years before—and the way I explained it was, “You know when you get a new iPad, you’ve got to set up your Facebook account and your email. Imagine you had ten thousand of these.” And she was like—I was like, “You know, companies like Google, company like big banks, they all have lots and lots and lots of computers.” And she was like, “They run all those things on iPads.” And I was like, “This is not really where my analogy was going.” But.

Corey: Right. And increasingly, though, it seems like the world has shifted in some direction where, when you explain that to your mother and she comes back with, “Well, wouldn’t they just put the application into Docker and be done with it?” Oh, dear. But that seems to be in many ways that the direction that the zeitgeist has moved in, whether or not that is the reality in many environments, where when you’re just deploying containers everywhere—through the miracle of Kubernetes—if you’ll pardon the dismissive scorn there, that you just package up your application, shove into a container, and then hurl it from the application team over the operations team, like a dead dog cast into your neighbor’s yard for him to worry about. And then it sort of takes up the space of you don’t have to manage state anymore because everything is mostly stateless in theory. How have you seen it play out in practice in the last five years?

Nigel: I mean, that’s a real trend. And, you know, the size of a container should be [laugh] smaller than an operating system. And the reality is, I’m a sysadmin; I love operating systems, I nerded out on operating systems. They’re a necessary evil, they’re terrible, terrible things: registry keys, config files, they’re a pain in the neck to deal with. And if you look at, I think what a lot of operations folks missed about Docker when it started was that it didn’t make their life better. It was worse.

It was, like, this actual, sort of, terrible toolchain where you sort of tied together all these different things. But really importantly, what it did is it put control into the hands of the developers, and it was the developers who were trying to do stuff who were trying to shift into applications. And I think Docker was a really great technology, in the sense of, you know, developers could ship value on their own. And that was the huge, huge leveling up. It wasn’t the interface, it wasn’t the user experience, it wasn’t all these things, it was just that the control got taken away from the IT trolls in their basement going, “No, don’t touch my servers,” and instead given straight to the developers. And that’s huge because it let us ship things faster. And that’s ultimately the whole goal of things.

Corey: The thing that really struck me the most from conducting the trainings that I did was meeting a whole bunch of people across the country, in different technological areas of specialty, in different states of their evolution as technologists, and something that struck me was just how much people wound up identifying with the technology that they worked on. When someone is the AIX admin, and the AIX machines are getting replaced with Linux boxes, there’s this tendency to fight against that and rebel, rather than learning Linux. And I get it; I’m as subject to this as anyone is. And in many cases, that was the actual pushback that I saw against adopting something like Puppet. If I identify my job as being the person that runs all these carefully curated scripts that I’ve spent five years building, and now that all gets replaced with something that is more of a global solution to my local problem, then it feels like a thing that made me special is eroding.

And we see that with the migration to cloud as well. When you’re the storage admin, and it just becomes an API call to S3, that’s kind of a scary thing. And when you’re one of the server hugger types—and again, as guilty as anyone of this—and you start to see cloud coming in as, like, a rising tide that eats up what it was that you became known for, it’s scary and it becomes a foundational shift in how you view yourself. What I really had a lot of sympathy for was the folks who’ve been doing this for 20 years. They were, in some cases, a few years away from retirement, and they’ve been doing basically the same set of tasks every year for 25 years.

It’s one year of experience repeated 25 times. And they don’t have that much time left in their career, intentionally, so they want to retire, but they also don’t really want to learn a whole bunch of new technologies just to get through those last few years. I feel for them. But at the same time—

Nigel: No, me too, totally. But what are you going to do? But without sounding too dismissive there, I think it’s a natural tendency for us to identify with the technology if that’s what you’re around all the time. You know, mechanics do this, truck drivers with brands of trucks, people, like, to build attachments to the technology they work with because we fit them into this bigger techno-social system. But I have a lot of empathy for the people in enterprise jobs who are being asked to change radically because the cycle of progress is speeding up faster and faster.

And as you say, they might be a few years away from retirement. I think I used to feel more differently about this when I was really hot-headed and much more of a tech enthusiast, and that’s what I identified with. In terms of, it’s okay for a job to just be a job for people. It’s okay for someone to be doing a job because they get good health care and good benefits and it’s feeding their family. That’s an important thing. You can’t expect everyone to always be incredibly passionate about technology choices in the same way that I think many of us who live on Twitter and hanging out in this space are.

Corey: Oh, I have no problem whatsoever with people who want to show up for 40 hours a week-ish, work on their job, and then go home and have lives and not think about computers at all. There’s this dark mass of developers out there that basically never show up on Twitter, they aren’t on IRC, they don’t go to conferences, and that’s fine. I have no problem with that, and I hope I don’t come across as being overly dismissive of those folks. I honestly wish I could be content like that. I just don’t hold still very well.

Nigel: [laugh]. Yeah, so I think you touched on a few interesting things there. And some of those we sort of cover in the State of DevOps Report, which is coming out in the next few weeks.

Corey: Indeed, and the State of DevOps Report started off at Puppet, and they’ve now done it for, what, 10 years?

Nigel: This is the 10th year, which is completely crazy. So, I was looking at the stats as I was writing it, and it’s 10 years of State of DevOps Reports; I think it’s 11 years of DevOps Weekly, Gareth Rushgrove’s newsletter; it’s 12 or 13 years of DevOpsDays that have been going on. This is longer than I spent in primary and high school put together. It’s kind of crazy that the DevOps movement is still, kind of, chugging along, even if it’s not necessarily the coolest kid on the block, now that GitOps, SRE flavor of the month, various kinds of permutations of how we work with technology, have perhaps got a little bit cooler. But it’s still very, very relevant to a lot of enterprises out there.

Corey: Yeah. As I frequently say, legacy is a condescending engineering term for ‘it makes money,’ and there’s an awful lot of that out there. Forget cloud, there are still companies wrestling with do we explore this virtualization thing? And that was something I was very against back in 2006, let’s be very honest. I am very bad at predicting the future of technology.

And, “I can see this for small niche edge workload cases, where you have a bunch of idle servers, but for the most part, who’s really going to use this in production?” Well, basically everyone because that, in turn, is what the cloud runs on. Yeah, I think we can safely say I got that one hilariously wrong. But hey, if you’re aren’t going to make predictions, then what’s it matter?

Nigel: But the industry pushes you in these directions. So, there was this massive bank in Asia who I’ve been working with for a long time and they were always resistant to adopting virtualization. And then it was only four or five years ago that I visited them; they’re like, “Right. Okay. It’s time. We’re rolling out VMware.” And I was like, “So, I’m really curious. What exactly changed in the last year or two in, like, 2014, 2015 that you decided virtualization was the key?” And I’m like—

Corey: Oh, there was this jackwagon who conducted this training? Yeah, no, no, sorry. I can’t take credit for that one.

Nigel: They couldn’t order one rack unit servers with CD drives anymore because their whole process was actually provisioning with CDs before that point.

Corey: Welcome to the brave new world of PXE booting, which is kind of hard, so yeah, virtualization is easier. You know, sometimes people have to be dragged into various ways of technological advancement. Which gets to the real thing I want to cover, since this is a promoted episode, where you’re talking about the State of DevOps Report, I’m almost less interested in what this year’s has to say specifically, than what you’ve seen over the last decade. What’s changed? What was true 10 years ago that is very much not true now? Bonus points if you can answer that without using the word Kubernetes more than twice.

Nigel: So, I think one of the big things was the—we’ve definitely passed peak DevOps team, if you may remember, there was a lot of arguments and there’s still regular, is DevOps a job title? Is it a team title? Is it a [crosstalk 00:14:33]—

Corey: Oh, I was much on the no side until I saw how much more I would get paid as a DevOps engineer instead of a systems administrator for the exact same job. So, you know, I shut up and I took the money. I figured that the semantic arguments are great, but yeah.

Nigel: And that’s exactly what we’ve written in the report. And I think it’s great. The sysadmins, we were unloved. You know, we were in the basement, we weren’t paid as much as programmers. The running joke used to be for developers, DevOps meant, “I don’t need ops anymore.” But for ops people, it was, “I can get paid like a developer.”

Corey: In many cases, “Oh, well, systems administrators don’t want to learn how to code.” It’s, yeah, you’re remembering a relatively narrow slice of time between the modern era, where systems administrator types need to be able to write in the lingua franca of everything—which is, of course, YAML, as far as programming languages go—and before that, to be a competent systems administrator, you needed to have a functional grasp of C. And—

Nigel: Yeah.

Corey: —there is only a limited window in which a bunch of bash scripts and maybe a smidgen of Perl would have carried you through. But the deeper understanding is absolutely necessary, and I would argue, always has been.

Nigel: And this is great because you’ve just linked up with one of the things we found really interesting about the report is that you know when we talk about legacy we don’t actually mean the oldest shit. Because the oldest shit is the mainframes; it’s a lot of bare metal applications. A lot of that in big enterprises—

Corey: We’re still waiting for an AWS/400 to replace some of that.

Nigel: Well, it’s administered by real systems engineers, you know, like, the people who wrote C, who wrote kernel extensions, who could debug things. What we actually mean by legacy is we mean late ’90s to late 2000s, early 2010s. Stuff that was put together by kids who, like me, happened to get a job because you grew up with a computer, and then the dotcom explosion happened. You weren’t necessarily particularly skilled, and a lot of people, they didn’t go through the apprenticeships that mainframe folks and systems engineers actually went through. And everyone just held this stuff together with, you know, duct tape and dental floss. And then now we’re paying the price of it all, like, way back down the track. So, the legacy is really just a certain slice of rapid growth in applications and infrastructure, that’s sort of an unmanageable mess now.

Corey: Oh, here in San Francisco, legacy is anything prior to last night’s nightly build. It’s turned into something a little ridiculous. I feel like the real power move as a developer now is to get a job, go in on day one, rebase everything in the Git repository to a single commit with a message, ‘legacy code’ and then force push it to the main branch. And that’s the power move, and that’s how it works, and that’s also the attitude we wind up encountering in a lot of places. And I don’t think it serves anyone particularly well to tie themselves so tightly to that particular vision.

Nigel: Yep, absolutely. This is a real problem in this space. And one of the things we found in the State of DevOps Report is that—let me back up a little and give a little bit of methodology of what we actually do. We survey people about their performance metrics, you know, like how quickly can you do deploys? What’s your mean time to recovery? Those sorts of things, and what practices do you actually employ?

And we essentially go through and do statistical analysis on this, and everyone tends to end up in three cohorts, they separate pretty easily, of low, medium, and high evolution. And so one of the things we found is that everyone at the low level has all sorts of problems. They have issues with what does my team do? What does the team next to me do? How do I talk to the team next to me?

How do I actually share anything? How do I even know what my goals are? Like, fundamental company problems. But everyone at all levels of evolution is stuck on two big things: not being able to find enough people with the right skills for what they need, and their legacy infrastructure holding them back.

Corey: The thing that I find the most compelling is the idea of not being able to find enough people with the skills that they need. And I’m going to break my own rule and mentioned Kubernetes as a prime example of this. If you are effective at managing Kubernetes in production, you will make a very comfortable living in any geographical location on the planet because it is incredibly complex. And every time we’ve seen this in previous trends, where you need to get more and more complexity, and more and more expertise just to run something, it looks like a sawtooth curve, where at some point that complexity, it gets abstracted away and compressed down into something that is basically a single line somewhere, or it happens below the surface level of awareness. My argument has been that Kubernetes is something no one’s going to care about in roughly three years from now, not because we’re not using it anymore, but because it’s below the level of awareness that we have to think about, in the same way that there aren’t a whole lot of people on the planet these days who have to think about the Linux virtual memory management subsystem. It’s there and a few people really care about it, but for the rest of us, we don’t have to think about that. That is the infrastructure underneath our infrastructure.

Nigel: Absolutely. I used to make a living—and it’s ridiculous looking back at this—for a year or two, doing high-performance custom compiled Apaches for people. Like, I was really really good at this.

Corey: Well yeah, Apache is a great example of this, where back in the ’90s, to get a web server up and running you needed to have three days to spare, an in-depth knowledge of GCC compiler flags, and hope for the best. And then RPM came out and then, okay, then YUM or other things like that—

Nigel: Exactly.

Corey: —on top of it. And then things like Puppet started showing up, and we saw, all right now, [unintelligible 00:20:01] installed. Great. And then we had—it took a step beyond that, and it was, “Oh, now it’s just a Docker-run whatever it is,” and these days, yeah, it’s a checkbox in S3.

Nigel: So, let me get your Kubernetes prediction down, right. So, you’re predicting Kubernetes is going to go away like Apache and highly successful things. It’s not an OpenStack failure state; it’s Apache invisibility state?

Corey: Absolutely. My timeline is a bit questionable, let’s be fair, but—it’s a little on the aggressive side, but yeah, I think that Kubernetes is inherently too complex for most people to have to wind up thinking about it in that way. And we’re not talking small companies; we’re talking big ones where you’re not in a position, if you’re a giant blue-chip Fortune 50, to hire 2000 people who all know Kubernetes super well, and you shouldn’t have to. There needs to be some flattening of all of that high level of complexity. Without the management tools, though, with things like Puppet and the things that came before and a bunch of different ways, we would all not be able to get anything done because we’d be too busy writing in assembly. There’s always going to be those abstractions on top abstractions on top abstractions, and very few people understand how it works all the way down. But that’s, in many cases, okay.

Nigel: That’s civilization, you know? Do you understand what happens when you plug in something to your electricity socket? I don’t want to know; I just want light.

Corey: And more to the point, whenever you flip the switch, you don’t have that doubt in your mind that the light is going to come on. So, if it doesn’t, that’s notable, and your first thought is, “Oh, the light bulb is out,” not, “The utility company is down.” And we talk about the cloud being utility computing.

Nigel: Has someone put a Kubernetes operator in this light switch that may break this process?

Corey: Well, okay, IoT does throw a little bit of a crimp into those works. But yeah. So, let’s talk more about the State of DevOps Report. What notable findings were there this year?

Nigel: So, one of the big things that we’ve seen for the last couple of years has been that most companies are stuck in the middle of the evolutionary progress. And anyone who deals with large enterprises knows this is true. Whatever they’ve adopted in terms of technology, in terms of working methods, you know, agile, various different things, most companies don’t tend to advance to the high levels; most places stay mired in mediocrity. So, we wanted to dive into that and try and work out why most companies actually stuck like this when they hit a certain size. And it turns out, the problems aren’t technology or DevOps, they really fundamental problems like, “We don’t have clear goals. I don’t understand what the teams next to me do.”

We did a bunch of qualitative interviews as well as the quantitative work in the survey with this report, and we talked to one group of folks at a pretty large financial services company who are like, “Our teams have all been renamed so many times, if I need to go and ask someone for something, I literally page up and down through ServiceNow, trying to find out where to put the change request.” And they’re like, “How do I know where to put a network port opening request for this particular service when there are 20 different teams that might be named the right thing, and some are obsolete, and I get no feedback whether I’ve sent it off to the right thing or to a black hole of enterprise despair?”

Corey: I really love installing, upgrading, and fixing security agents in my cloud estate! Why do I say that? Because I sell things, because I sell things for a company that deploys an agent, there's no other reason. Because let’s face it. Agents can be a real headache. Well, now Orca Security gives you a single tool that detects basically every risk in your cloud environment -- and that’s as easy to install and maintain as a smartphone app. It is agentless, or my intro would’ve gotten me into trouble here, but it can still see deep into your AWS workloads, while guaranteeing 100% coverage. With Orca Security, there are no overlooked assets, no DevOps headaches, and believe me you will hear from those people if you cause them headaches. and no performance hits on live environments. Connect your first cloud account in minutes and see for yourself at orca.security. Thats “Orca” as in whale, “dot” security as in that things you company claims to care about but doesn’t until right after it really should have.

Corey: That doesn’t get better with a lot of modernization. I mean, I feel like half of my job—and I’m not exaggerating—is introducing Amazonians to one another. Corporate communication between departments and different groups is very far from a solved problem. I think the tooling can help but I’ve never been a big believer in solving political problems with technology. It doesn’t work. People don’t work that
way.

Nigel: Absolutely. One of my earliest times working at Puppet doing, sort of, higher-level sales and services and support, huge national telco walk in there; we’ve got the development team, the QA team, the infrastructure team. In the course of this conversation, one of them makes a comment about using apt-get, and the others were like, “What do you mean? We’re on RHEL.” And it turned out, production was running on RHEL, the QA team running on CentOS and the developers were all building everything on Ubuntu. And because it was Java wraps, they almost didn’t have to care. But write once, debug everywhere.

Corey: History doesn’t repeat, but it rhymes; before Docker, so much of development in startup-land was how do I make my MacBook Pro look a lot more like an EC2 Linux instance? And it turns out that there’s an awful lot of work that goes into that maybe isn’t the best use of people’s time. And we start to see these breakthroughs and these revelations in a bunch of different ways. I have to ask. This is the tenth year that you’ve done the State of DevOps Report. At this point, why keep doing it? Is it inertia? Are you still discovering new insights every year on top of it? Or is it one of those things where well someone in marketing says we have to do it, so here we are?

Nigel: No, actually, it’s not that at all. So definitely, we’re going to take stock after this year because ten years feels like a really good point to, sort of—it’s a nice round number in certain kind of number system. Mainly the reason is, a lot of my job is going and helping big enterprises just get better at using technology. And it’s funny how often I just get folks going, “Oh, I read this thing,” like people who aren’t on the bleeding edge, constantly discussing these things on Twitter or whatever, but the State of DevOps Report makes its way to them, and they’re like, “Oh, I read a thing there about how much better it is if we standardized on one operating system. And that made a really huge difference to what we were actually doing because you had all this data in there showing that that is better.”

And honestly, that’s the biggest reason why I ended up doing it. It’s the fact that it seems to be a tool that has made its way through to very hard to penetrate enterprise folks. And they’ll read it and managers will read things that are like, “If you set clear goals for your team and get them to focus on optimizing the legacy environment, you will see returns on it.” And I’m being a little bit facetious in the tone that I’m saying because a lot of this stuff does feel obvious if you’re constantly swimming in this stuff day-to-day, but it’s not just the practitioners who it’s just a job for in a lot of big companies. It’s true, a lot of the management chain as well. They’re not necessarily going out and reading up on modern agile IT management practices day-to-day, for fun; they go home and do something else.

Corey: One of my favorite conferences is Gene Kim’s DevOps Enterprise Summit, and the specific reason behind that is, these are very large companies that go beyond companies, in some cases, to institutions, where you have the US Air Force as a presenter one year and very large banks that are 200 years old. And every other conference, it seems, more or less involves people getting on stage, deliver conference-ware and tell stories that make people at those companies feel bad about themselves. Where it’s, “We’re Twitter for Pets, and this is how we deploy software,” or the ever-popular, “This is how Netflix does stuff.” Yeah, Netflix has basically no budget constraints as far as hiring engineering folks go, and lest we forget, their failure mode is someone can’t watch a movie right now. It’s not exactly the same thing as the ATM starts spitting out the wrong balance in the streets.

And I think that there’s an awful lot of discussion where people look at the stories people tell on conference stages and come away feeling bad from it. Very often, I’ll see someone from a notable tech companies talk about how they do things. And, “Wow, I wish my group did things like that.” And the person next to me says, “Yeah, me too.” And I check and they work at the same company.

And the stories we tell are not necessarily the stories that we live. And it’s very easy to come away discouraged from these things. And that goes triply so for large enterprises that are regulated, that have significant downside risk if the technology fails them. And I love watching people getting a chance to tell those stories.

Nigel: Let me jump in on that really quickly because—

Corey: Please, by all means.

Nigel: —one is, you know, having done four years at Google, things are a shitshow internally there, too—

Corey: You’re talking about it like it’s prison. I like it.

Nigel: —you know. [laugh]. People get horrified when they turn up and they’re like, “Oh, what it’s not all gleaming, perfect software artifacts, delivered from the hand of Urs.” But I think what Gene has done with DevOps Enterprise Summit is fantastic in how people share more openly their failure states, but even there—and this is an interesting result we found from a few years ago, State of DevOps Report—even those executives are being more optimistic because it’s so beaten into you as the senior executive; you’re putting on a public face, and even when they’re trying to share the warts-and-all story, they can’t help but put a little bit of a positive spin on it. Because I’ve had exactly the same experience there where someone’s up there telling a war story, and then I look, turn to the person next to me, and they work at that same 300-year-old bank, and they’re like, “Actually, it’s much, much worse than this, and we didn’t fix it quite as well as that.” So, I think the big tech companies have terrible inside unless they’re Netflix, and the big enterprises are also terrible. But they’re also—

Corey: No, no, I’ve talked to Netflix people, too. They do terrible things internally there, too. No one talks about the fact that their internal environments are always tire fires, and there are two stories: the stories we tell publicly, and the reality. And if you don’t believe me on that, look at any company in the world’s billing system. As much as we all talk about agile and various implementations thereof when it comes to
things that charge customers money, we’re all doing waterfall.

Nigel: Absolutely. [laugh].

Corey: Because mistakes show when you triple-charge someone’s credit card for the cost of a small country’s GDP. It’s a problem. I want to normalize those sorts of things more. I’m looking forward to reading this year’s report, just because it’s interesting to see how folks who are in environments that differ from the ones that I get to see experience in this stuff and how they talk about it.

Nigel: Yeah. And so one of the big results I think there for big companies that’s really interesting is that one of the, sort of, anti-patterns is having lots of different types of teams. And I kind of touched on this before about having confusing team titles being a real problem. And not being able to cross organizational boundaries quickly is really, really—you know, it’s a huge inhibitor and cause, source of friction. But turns out the pattern that is actually really great is one that the Team Topologies guys have discovered.

If you’ve been following what Matthew Skelton and Manuel Pais have been doing for a while, they’ve basically been documenting a pattern in software organizations of a small number of team types, of a platform team, value stream teams, complicated subtest system teams, and enabling teams. And so we worked with Manuel and Matt on this year’s report and asked a whole bunch of questions to try and validate the Team Topologies model, and the results came back and they were just incredibly strong. Because I think this speaks to some of the stuff you mentioned before that no one can afford to hire an army of Kubernetes developers, and whatever the hottest technology is in five years, most big companies can’t hire an army of those people either. And so the way you get scale internally before those things become commoditized is you build a small team and create the situation where they can have outsized leverage inside their organization, like get rid of all the blockers to fast flow and make their focus self-service to other people. Because if you’re making all of your developers learn distributed systems operations arcane knowledge, that’s not a good use of their time, either.

Corey: It’s really not. And I think that’s something that gets lost a lot is, I’ve never yet seen a company beyond the very early startup stage, where the AWS bill exceeded the cost of the people working on the AWS bill. Payroll is always a larger expense than infrastructure unless you’re doing something incredibly strange. And, oh, I want to save some money on the cloud bill is very often offset by the sheer amount of time that you’re going to have to pay people to work on that because, contrary to what we believe as engineering hobbyists, people’s time is very far from free. And it’s also the opportunity cost of if you’re going to work on this thing instead of something else, well, is that really the best choice? It comes down to contextualizing what technology is doing as well as with what’s happening over in the world of business strategy. And without having a bridge between those, it doesn’t seem to work very well.

Nigel: Absolutely. It’s insane. It’s literally insane that, as an industry, we will optimize 5%, 3% of our infrastructure bill or application workload and yet not actually reexamine business processes that are causing your people to spend 10% of their time in synchronous meetings. You can save so much more money and achieve so much more by actually optimizing for fast flow, and getting out of the way of the people who cost lots of money.

Corey: So, one last topic that I want to cover before we call it an episode. You talk to an awful lot of folks, and it’s easy to point at the
aspirational stories of folks doing things the right way. But let’s dish for a minute. What are you seeing in terms of people not using the cloud properly? I feel like you might have a story or two on that one.

Nigel: I do have a few stories. So, in this year’s report, one of the things we wanted to find out of, like, are people using the cloud in the way we think of cloud; you know, elastic, consumption-based, all of these sorts of things. We use the NIST metrics, which I recognize can be a little controversial, but I think you’ve got to start somewhere as a certain foundation. It turns out just about everyone is using the public cloud. And when I say cloud, I’m not really talking about people’s internal VMware that they rebadged as cloud; I’m talking about the public cloud providers.

Everyone’s using it, but almost no one is taking advantage of the functionality of the cloud. They’re instead treating it like an on-premise VMware installation from the mid-2000s, they’re taking six weeks to provision instances, they’re importing all of their existing processes, they keep these things running for a long time if they fall over, one person is tasked with, “Hey, do you know how pet number 45 is actually doing here?” They’re not really treating any of these things in the way that they’re actually meant to. And I think we forget about this a lot of the time when we talk about cloud because we jump straight to cloud-native, you know, the sort of bleeding edge of folks in serverless, highly orchestrated containers. I think if you look at the actual numbers, the vast majority of cloud usage, it’s still things like EC2 instances on AWS. And there’s a reason: because it’s a familiar paradigm for people. We’re definitely going to progress past there, but I think it’s easy to leave the people in the middle behind when we’re talking about cloud and how to improve the ecosystem that they all operate in.

Corey: Part of the problem, too, is that whenever we look at how folks are misusing cloud, it’s easy to lose sight of context. People don’t generally wake up and decide I’m going to do a terrible job today unless they work in, you know, Facebook’s ethics department or something. Instead, it’s very much a people are shaped by the constraints they’re laboring under from a bunch of different angles, and they’re trying to do the best with what they have. Very often, the reason that a practice or a policy exists is because, once upon a time, there was a constraint that may or may not still be there, and going forward the way that they have seemed like the best option at the time. I found that the default assumption that people are generally smart and doing the right thing with the information they have carries you a lot further, in many respects than what I did is a terrible junior consultant, which is, “Oh, what moron built this?” Invariably to said moron, and then the rest of the engagement rapidly goes downhill from there. Try and assume good faith, and if you see something that makes no sense, ask, “Why is it like this?” Rather than, “Why is it like this?” Tone counts for a lot.

Nigel: It’s the fundamental attribution bias. It’s why we think all other drivers on the road are terrible, but we actually had a good reason for swerving into that lane.

Corey: “This isn’t how I would have built it. So, it’s awful.”

Nigel: Yeah, exactly.

Corey: Yeah. And in some cases, though, there are choices that are objectively bad, but I tried to understand where they came from there. Company policy, historically, around things like data centers, trying to map one-to-one to cloud often miss some nuances. But hey, there’s a reason it’s called the digital transformation, not a project that we did.

Nigel: [laugh]. And I think you’ve got to always have empathy for the people on the ground. I quite often have talked to folks who’ve got, like, a terrible cloud architecture with the deployment and I’m like, “Well, what happened here?” And they went, “Well, we were prepared to deploy this whole thing on AWS, but then Microsoft’s salespeople got to the CTO and we got told at the last minute we’re redeploying everything on Azure.” And so these people were often—you know, you’re given a week or two to pivot around the decision that doesn’t necessarily make any sense to them.

And there may have been a perfectly good reason for the CTO to do this: they got given really good kickbacks in terms of bonuses for, like, how much they were spending on the infrastructure—I mean, discounts—but people on the ground are generally doing the best with what they can do. If they end up building crap, it’s because our system, society, capitalism, everything else is at fault.

Corey: [laugh]. I have to say, I’m really looking forward to seeing the observations that you wound up putting into this report as soon as it drops. I’m hoping that I get a chance to speak with you again about the findings, and then I can belligerently tell you to justify yourself. Those
are my favorite follow-ups.

Nigel: [unintelligible 00:37:05].

Corey: If people want to get a copy of the report for themselves or learn more about you, where can they find you?

Nigel: Just head straight to puppet.com, and it will be on the banner on the front of the site.

Corey: Excellent. And will, of course, put a link to that in the show notes, if people can’t remember puppet.com. Thank you so much for taking the time to speak with me. I really appreciate it.

Nigel: Awesome. No worries. It was good to catch up.

Corey: Nigel Kersten, field CTO at Puppet. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice as well as an insulting comment telling me that ‘comcastic’ isn’t a funny word, and tell me where you work, though we already know.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Innovations and the Changing DevOps Tides of Tech with Nigel Kersten

Episode Summary

Episode Show Notes & Transcript

You might also like

See Why GenAI Workloads Are Breaking Observability with Wayne Segar

Presenting at re:Invent with Matt Berk and Bowen Wang

The Latest State of IaC with Ido Neeman

Get the Newsletter

Sponsor an Episode