The Need for Reliability with Lex Neva

Episode Summary

Lex Neva, Staff Site Reliability Engineer at Honeycomb and Curator of SRE Weekly, joins Corey on Screaming in the Cloud to discuss reliability and the life of a newsletter curator. Lex shares some interesting insights on how he keeps his hobbies and side projects separate, as well as the intrusion that open-source projects can have on your time. Lex and Corey also discuss the phenomenon of newsletter curators being much more demanding of themselves than their audience typically is. Lex also shares his views on how far reliability has come, as well as how far we have to go, and the critical implications reliability has on our day-to-day lives. 

Episode Show Notes & Transcript

About Lex

Lex Neva is interested in all things related to running large, massively multiuser online services.  He has years of SRE,  Systems Engineering, tinkering, and troubleshooting experience and perhaps loves incident response more than he ought to.  He’s previously worked for Linden Lab, DeviantArt, Heroku, and Fastly, and currently works as an SRE at Honeycomb while also curating the SRE Weekly newsletter on the side.
Lex lives in Massachusetts with his family including 3 adorable children, 3 ridiculous cats, and assorted other awesome humans and animals.  In his copious spare time he likes to garden, play tournament poker, tinker with machine embroidery, and mess around with Arduinos.

Links Referenced:

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked into a vendor due to proprietary data collection, querying, and visualization? Modern-day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open-source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, wherever they may be at snark.cloud/chronosphere that’s snark.cloud/chronosphere.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Once upon a time, I decided to start writing an email newsletter, and well, many things happened afterwards, some of them quite quickly. But before that, I was reading a number of email newsletters in the space. One that I’d been reading for a year at the time, was called SRE Weekly. It still comes out. I still wind up reading it most weeks.

And it’s written by Lex Neva, who is not only my guest today but also a staff site reliability engineer at Honeycomb. Lex, it is so good to finally talk to you, other than reading emails that we send to the entire world that pass each other like ships in the night.

Lex: Yeah. I feel like we should have had some kind of meeting before now. But yeah, it’s really good to [laugh] finally meet you.

Corey: It was one of the inspirations that I had. And to be clear, when I signed up for your newsletter originally—I was there for issue 15, which is many, many years ago—I was also running a small-scale SRE team at the time. It was, I found as useful as a part of doing my job and keeping abreast of what was going on in the ecosystem. And I found myself, once I went independent, wishing that your newsletter and a few others had a whole bunch more AWS content. Well, why doesn’t it?

And the answer is because you are, you know, a reasonable person who understands that mental health is important and boundaries exist for a reason. No one sensible is going to care that much about one cloud provider all the time [sigh]. If only we were all that wise.

Lex: Right? Well, [laugh] well, first of all, I love your newsletter, and also the content that you write that—I mean, I would be nowhere without content to link to. And I’m glad you took on the AWS thing because, much like how I haven’t written Security Weekly, I also didn’t write any kind of AWS Weekly because there’s just too much. So, thanks for falling on that sword.

Corey: I fell on another one about two years ago and started the Thursdays, which are Last Week in AWS Security. But I took a different bent on it because there are a whole bunch of security newsletters that litter the landscape and most of them are very good—except for the ones that seem to be entirely too vendor-captured—but the problem is, is that they lacked both a significant cloud focus, as well as an understanding that there’s a universe of people out here who care about security—or at least should—but don’t have the word security baked into their job title. So, it was very insular, using acronyms they assume that everyone knows, or it’s totally vendor-captured and it’s trying to the whole fear, uncertainty, and doubt thing, “And that’s why you should buy this widget.” “Will it solve problems?” “Well, it’ll solve our revenue problems at our company that sells the widgets, but other than that, not really.” And it just became such an almost incestuous ecosystem. I wanted something different.

Lex: Yeah. And the snark is also very useful [laugh] in order to show us that you’re not in their pocket. So yeah, nice work.

Corey: Well, I’ll let you in on a secret, now that we are—what, I’m somewhat like 300 and change issues in, which means I’ve been doing this for far too long, the snark is a byproduct of what I needed to do to write it myself. Because let’s face it, this stuff is incredibly boring. I needed to keep myself interested as I started down that path. And how can I continually keep it fresh and funny and interesting, but not go too far? That’s a fun game, whereas copying and pasting some announcement was never fun.

Lex: Yeah, that’s not—I hear you on trying to make it interesting.

Corey: One regret that I’ve had, and I’m curious if you’ve ever encountered this yourself because most people don’t get to see any of this. They see the finished product that lands in their inbox every Monday, and—in my case, Monday; I forget the exact day that yours comes out. I collect them and read through them for them all at once—but I find that I have often had caused a look back and regret the implicit commitment in Last Week in AWS as a name because it would be nice to skip a week here and there, just because either I don’t particularly feel like it, or wow, there was not a lot of news worth talking about that came out last week. But it feels like I’ve forced myself onto a very particular treadmill schedule.

Lex: Yeah. Yeah, it comes with, like, calling it SRE Weekly. I just followed suit for some of the other weeklies. But yeah, that can be hard. And I do give myself permission to take a week off here and there, but you know, I’ll let you in on a secret.

What I do is I try to target eight to ten articles a week. And if I have more than that, I save some of them. And then when it comes time to put out an issue, I’ll go look at what’s in that ready queue and swap some of those in and swap some of the current ones out just so I keep things fresh. And then if I need a week off, I’ll just fill it from that queue, you know, if it’s got enough in it. So, that lets me take vacations and whatnot. Without that, I think I would have had a lot harder of a time sticking with this, or there just would have been more gaps. So yeah.

Corey: You’re fortunate in that you have what appears to be a single category of content when you construct your newsletter, whereas I have three that are distinct: AWS releases and announcements and news and things to make fun of for the past week; the things from the larger community folks who do not work there, but are talking about interesting approaches or news that is germane; and then ideally a tip or a tool of the week. And I found, at least lately, that I’ve been able to build out the tools portion of it significantly far in advance. Because a tool that makes working with AWS easier this week is probably still going to be fairly helpful a month from now.

Lex: Yeah, that’s fair. Definitely.

Corey: But putting some of the news out late has been something of a challenge. I’ve also learned—by getting it wrong—that I’m holding myself to a tighter expectation of turnaround time than any part of the audience is. The Thursday news is all written the week before, almost a full week beforehand and no one complains about that. I have put out the newsletter a couple of times an hour or two after its usual 7:30 pacific time slot that it goes out in; not a single person has complained. In one case, I moved it by a day to accommodate an announcement but didn’t explain why; not a single person emailed in. So, okay. That’s good to know.

Lex: Yeah, I’ve definitely gotten to, like, Monday morning, like, a couple of times. Not much, not many times, but a couple of times, I’ve gotten a Monday morning be like, “Oh, hey. I didn’t do that thing yesterday.” And then I just release it in the morning. And I’ve never had a complaint.

I’ve cancelled last minute because life interfered. The most I’ve ever had was somebody emailing me and be like, you know, “Hope you feel better soon,” like when I had Covid, and stuff like that. So, [laugh] yeah, sometimes maybe we do hold ourselves to a little bit of a higher standard than is necessary. I mean, there was a point where I got—I had major eye surgery and I had to take a month off of everything and took a month off the newsletter. And yeah, I didn’t lose any subscribers. I didn’t have any complaints. So people, I think, appreciate it when it’s there. And, you know, if it’s not there, just wait till it comes out.

Corey: I think that there is an additional challenge that I started feeling as soon as I started picking up sponsors for it because it’s well, but at this point, I have a contractual obligation to put things out. And again, life happens, but you also don’t want to have to reach out on apology tours every third week or whatnot. And I think that’s in part due to the fact that I have multiple sponsors per issue and that becomes a bit of a juggling dance logistically on this end.

Lex: Yeah. When I started, I really didn’t think I necessarily wanted to have sponsors because, you know, it’s like, I have a job. This is just for fun. It got to the point where it’s like, you know, I’ll probably stop this if there’s not some kind of monetary advantage [laugh]. And having a sponsor has been really helpful.

But I have been really careful. Like, I have always had only a single sponsor because I don’t want that many people to apologize to. And that meant I took in maybe less money than I then I could have, but that’s okay. And I also was very clear, you know, even from the start having a contract that I may miss a week without notice. And yes, they’re paying in advance, but it’s not for a specific range of time, it’s for a specific number of issues, whenever those come out. That definitely helped to reduce the stress a little bit. And I think without that, you know, having that much over my head would make it hard to do this, you know? It has to stay fun, right?

Corey: That’s part of the things that kept me from, honestly, getting into tech for the first part of my 20s. It was the fear that I would be taking a hobby, something that I love, and turning it into something that I hated.

Lex: Yeah, there is that.

Corey: It’s almost 20 years now and I’m still wondering whether I actually succeeded or not in avoiding hating this.

Lex: Well, okay. But I mean, are you, you know, are you depressed [unintelligible 00:09:16] so there’s this other thing, there’s this thing that people like to say, which is like, “You should only do a job that you really love.” And I used to think that. And I don’t actually think that anymore. I think that it is important to have a job that you can do and not hate day-to-day, but there’s no shame in not being passionate about your work and I don’t think that we should require passion from anyone when we’re hiring. And I think to do so is even, like, privilege. So, you know, I think that it’s totally fine to just do something because it pays the bills.

Corey: Oh, absolutely. I find it annoying as hell when I’m talking to folks who are looking to hire for roles and, “Well, include a link to your GitHub profile,” is a mandatory field. It’s, well, great. What about people who work in places where they’re not working on open-source projects as a result, and they can’t really disclose what they’re doing? And the expectation that oh, well outside of work, you should be doing public stuff, too.

It’s, I used to do a lot of public open-source style work on GitHub, but I got yelled at all the time for random, unrelated reasons and it’s, I don’t want to put something out there that I have to support and people start to ask me questions about. It feels like impromptu unasked-for code review. No, thanks. So, my GitHub profile looks fairly barren.

Lex: You mean like yelling at you, like, “Oh, you’re not contributing enough.” Or, you know, “We need this free thing you’re doing, like, immediately,” or that kind of thing?

Corey: Worse than that. The worst example I’ve ever had for this was when I was giving a talk called “Terrible Ideas in Git,” and because I wanted to give some hilariously contrived demos that took a fair bit of work to set up, I got them ready to go inside of a Docker container because I didn’t trust that my laptop would always work, I’m might have to borrow someone else’s, I pushed that image called “Terrible Ideas” up to Docker Hub. And I wound up with people asking questions about it. Like, “Is this vulnerable to ShellCheck.” And it’s, “You do realize that this is intentionally designed to be awful? It is only for giving a very specific version of a very specific talk. It’s in public, just because I didn’t bother to make it private. What are you doing? Please tell me you’re not running this in production at a bank?” “No comment.” Right. I don’t want that responsibility of people yelling at me for things I didn’t do on purpose. I want to get yelled at for the things I did intentionally.

Lex: Exactly. It’s funny that sometimes people expect more out of you when you’re giving them something free versus when they’re paying you for it. It’s an interesting quirk of psychology that I’m sure that professionals could tell me all about. Maybe there’s been research on it, I don’t know. But yeah, that can be difficult.

Corey: Oh, absolutely. I used to work at a web hosting company and the customer spending thousands a month with us were uniformly great. But there was always the lowest tier customer of the cheapest thing that we offered that seemed to expect that that entitle them to 80 hours a month of support from engineering problems and whatnot. And it was not profitable to service some of those folks. I’ve also found that there’s a real transitive barrier that begins as soon as you find a way to charge someone a dollar for something.

There’s a bit of a litmus test of can you transfer a dollar from your bank account to mine? And suddenly, the entire tenor of the conversations with people who have crossed that boundary change. I have toyed, on some level, with the idea of launching a version of this newsletter—or wondering if I retcon the whole thing—do I charge people to subscribe to this? And the answer I keep coming away with is not at all because it started in many respects is marketing for AWS bill consulting and I want the audience as fast as possible. Artificially limiting its distribution via a pay-for model just seemed a little on the strange side.

Lex: Yeah. And then you’re beholden to a very many people and there’s that disproportionality. So, years ago, before I even started in my career in I guess, you know, things that were SRE before SRE was cool, I worked for a living in Second Life. Are you familiar with Second Life?

Corey: Oh, yes. I’m very familiar with that. Linden Labs.

Lex: Yep. So, I worked for Linden Lab years later, but before I worked for them, I sort of spent a lot of my time living in Second Life. And I had a product that I sold for two or three dollars. And actually, it’s still in there; you could still buy it. It’s interesting. I don’t know if it’s because the purchase price was 800 Linden dollars, which equates to, like, $2.16, or something like that, but—

Corey: The original cryptocurrency.

Lex: Right, exactly. Except there’s no crypto involved.

Corey: [laugh].

Lex: But people seem to have a disproportionate amount of, like, how much of my time they expected for support. You know, I’m going to support them a little bit. You have to recognize at some point, I actually can’t come give you a tutorial on using this product because you’re one of 500 customers for this month. And you give me two dollars and I don’t have ten hours to give you. You know, like, sorry [laugh]. Yeah, so that can be really tough.

Corey: And on some level, you need to find a way to either charge more or charge for support on top of it, or ideally—it I wish more open-source projects would take this approach—“Huh. We’ve had 500 people asking us the exact same question. Should we improve our docs? No, of course not. They’re the ones who are wrong. It’s the children who are getting it wrong.”

I don’t find that approach [laugh] to be particularly useful, but it bothers me to no end when I keep running into the same problem onboarding with something new and I ask about it, and, “Oh, yeah, everyone runs into that problem. Here’s how you get around it.” This would have been useful to mention in the documentation. I try not to ask questions without reading the manual first.

Lex: Well, so there’s a couple different directions. I could go with this. First of all, there’s a really interesting thing that happened with the core-js project that I recommend people check out. Another thing that I think the direction I’ll go at the moment—we can bookmark that other one, but I have an open-source project on the side that I kind of did for my own fun, which is a program for creating designs that can be processed by computer-controlled embroidery machines. So, this is sewing machines that can plot stitches in the x-y plane based on a program that you give it.

And there really wasn’t much in the way of open-source software available that could help you create these designs and so I just sort of hack something together and started hacking with Python for my own fun, and then put it out there and open-sourced. And it’s kind of taken off, kind of like gotten a life of its own. But of course, I’ve got a newsletter, I’ve got three kids, I’ve got a family, and a day job, and I definitely hear you on the, like, you know, yeah, we should put this FAQ in the docs, but there can be so little time to even do that. And I’m finding that there’s, like—you know, people talk about work-life balance, there’s, like, work slash life slash open-source balance that you really—you know, you have to, like, balance all three of them.

And a lot of weeks, I don’t have any time to spend on the project. But you know what, it’s still kicks along and people just kind of, they use my terrible little project [laugh] as best they can, even though it has a ton of rough edges. I’m sorry, everyone, I’m so sorry. I know it has a t—the UI is terrible. But yeah, it’s interesting how these things sometimes take on a life of their own and you can feel dragged along by your own open-source work, you know?

Corey: It always bothers me—I think this might tie back to the core-js issue you talked about a second ago—where there are people who are building and supporting open-source tools or libraries that they originally constructed to scratch an itch and now they are core dependencies of basically half the internet. And these people are still wondering on some level, how do I put food on the table this month? It’s wild to me. If there were justice in the world, you’d start to think these people would wind up in never-have-to-work-again-if-they-don’t-want-to positions. But in many cases, it’s exactly the opposite.

Lex: Well, that’s the really interesting thing. So, first of all, I’m hugely privileged to have any time to get to work on open-source. There’s plenty of people that don’t, and yeah, so requiring people to have a GitHub link to show their open-source contributions is inherently unfair and biased and discriminatory. That aside, people have asked all along, like, “Lex, this is decent software, you could sell this. You could charge money for this thing and you could probably make a, you know, a decent living at this.”

And I categorically refuse to accept money for that project because I don’t want to have to support it on a commercial level like that. If I take your money, then you have an expectation that—especially if I charge what one would expect—so this software, part of the reason I decided to write my own is because it starts at two-hundred-some-off dollars for the competitors that are commercial and goes up into the five, ten-thousand dollars. For a software package. Mine is free. If I started charging money, then yeah, I’m going to have to build a support department and we’re going to have a knowledge base, I’m going to have to incorporate. I don’t want to do that for something I’m doing for fun, you know? So yeah, I’m going to keep it free and terrible [laugh].

Corey: It becomes something you love, turns into something you hate without even noticing that it happens. Or at least something that you start to resent.

Lex: Yeah. I don’t think I would necessarily hate machine embroidery because I love it. It’s an amazingly fun little quirky hobby, but I think it would definitely take away some of the magic for me. Where there’s no stress at all, I can spend months noodling on an algorithm getting it right, whereas it’d be, you know, if I start having to have deliverables, it changes it entirely. Yeah.

Corey: It’s odd, it seems, on some level too, that the open-source world that I got started with has evolved in a whole bunch of different ways. Whereas it used to be write a quick fix for something and it would get merged, in many cases by the time you got back from lunch. And these days, it seems like it takes multiple weeks, especially with a corporate-controlled open-source project, and there’s so much back and forth. And even getting the boilerplate, like the CLI—the Contributor License Agreement—aside and winding up getting other people to sign off on it, then there’s back and forth, in some cases for weeks about, well, the right kind of test coverage and how to look at this and the right holistic framework. And I appreciate that there is validity and value to these things, but is that the bulk of the effort should be going when there’s a pull request ready to go that solves a breaking customer problem?

But the test coverage isn’t right so we’re going to delay it for two or three releases. It’s what are you doing there? Someone lost the plot somewhere. And I’m sure there are reasons that makes sense, given the framework people are operating within. I just find it maddening from the side of having to [laugh] deal with this as a human.

Lex: Yeah, I hear you. And it sometimes can go even beyond test coverage to something like code style, you know? It’s like, “Oh, that’s not really in the style of this project,” or, “You know, I would have written it this way.” And one thing I’ve had to really work on, on this project is to make it as inviting to developers as possible. I have to sometimes look at things and be like, yeah, I might do that a different way. But does that actually matter? Like, do I have a reason for that that really matters or is it just my style? And maybe because it’s a group project I should just be like, no, that’s good as it is.


[midroll 00:20:23]

Corey: So, you’ve had an interesting career. And clearly you have opinions about SRE as a result. When I started seeing that you were the author of SRE Weekly, years ago, I just assumed something that I don’t believe is true. Is it possible that you have been contributing to the community around SRE, but somehow have never worked at Google?

Lex: I have never worked at Google. I have never worked at Netflix. I’ve never worked at any of those big companies. The biggest company I’ve worked for is Salesforce. Although I worked for Heroku who had been bought by Salesforce a couple of years prior, and so it was kind of like working for a startup inside a big company. And here’s the other thing. I created that newsletter two months after starting my first job where I had a—like, the first job in which I was titled ‘SRE.’ So, that’s possibly contentious right there.

Corey: You know, I hadn’t thought of it this way, but you’re right. I did almost the exact same thing. I was no expert in AWS when I started these things. It came out of an effort that I needed to do of keeping touch with everything that came out that had potential economic impact, which it turns out are most things when you understand architecture and cost are the same thing when it comes to cloud. But I was more or less gathering what smart people were saying.

And somehow there’s been this osmotic effect, where people start to view me as the wise old sage of the mountain when it comes to AWS. And no, no, no, I’m just old and grumpy. That looks alike. Don’t mistake it for wisdom. But people will now seek me out to get my opinion on things and I have no idea what the answer looks like for most of the stuff.

But that’s the old SRE model—or sysadmin model that I’ve followed, which is when you don’t know the answer, well, how do you get to a place where you can find the answer? How do you troubleshoot this? Click the button. It doesn’t work? Well, time to start taking the button apart to figure out why.

Lex: Yeah, definitely. I hear you on people. So, first of all, thanks to everyone who writes the articles that I include. I would be nothing without—I mean—literally, that I could not have a newsletter without content creators. I also kind of started the newsletter as an exploration of this new career title.

I mean, I’ve been doing things that basically fit along with SRE for a long time, but also, I think my view of SRE might be not really the same as a lot of folks, or, like, that Google passed down from the [Google Book Model 00:22:46]. I don’t—I’m going to be a little heretical here—I don’t necessarily a hundred percent believe in the SLI SLO SLA error budget model. I don’t think that that necessarily fits everyone, I’m not sure even suits the bigger companies as well as they think it does. I think that there’s a certain point to which you can’t actually predict failure and just slowing down on your deploys. And it likes to cause there to be fewer incidents so that you can get—your you know, you can go back to passing in your error budget, to passing your SLO, I’m not sure that actually makes sense or is realistic and works in the real world.

Corey: I’ve been left with the distinct impression that it’s something of a framework for how to think about a lot of those things. And it’s for folks on a certain point of their development along whatever maturity model or maturity curve you want to talk about, it becomes extraordinarily useful. And at some point, it feels like the path that a given company is on will deviate from that. And, on some level, if you don’t wind up addressing it, it turns into what it seems like Agile did, where you wind up with the Cult of Agile around it and the entire purpose of it is to perpetuate the Cult of Agile.

And I don’t know that I’m necessarily willing to go so far as to say that’s where SLOs are headed right now, but I’m starting to get the same sort of feeling around the early days of the formalization of frameworks like that, and the ex cathedra proclamation that this is right for everyone. So, I’m starting to wonder whether there’s a reckoning, in that sense, coming down the road. I’m fortunate that I don’t run anything that’s production-facing, so for me, it’s, I don’t have to care about these things. Mostly.

Lex: Yeah. I mean, we are in… we’re in 2023. Things have come so much further than when I was a kid. I have a little computer in my pocket. Yeah, you know, “Hey, math teacher, turns out yeah, we do carry calculators around with us wherever we go.” We’ve built all these huge, complicated systems online and built our entire society around them.

We’re still in our infancy. We still don’t know what we’re doing. We’re still feeling out what SRE even is, if it even makes sense, and I think there’s—yeah, there’s going to be more evolution. I mean, there’s been the, like, what is DevOps and people coining the term DevOps and then getting, you know, almost immediately subsumed or turned into whatever other people want. Same thing for observability.

I think same thing for SRE. So honestly, I’m feeling it out as I go and I think we all are. And I don’t think anyone really knows what we’re doing. And I think that the moment we feel like we do is probably where we’re in trouble. Because this is all just so new. Look where we were even 40 years, 30, even 20 years ago. We’ve come really far.

Corey: For me, one of the things that concerns slash scares me has been that once someone learns something and it becomes rote, it sort of crystallizes in amber within their worldview, and they don’t go back and figure out, “Okay, is this still the right approach?” Or, “Has the thing that I know changed?” And I see this on a constant basis just because I’m working with AWS so often. And there are restrictions and things you cannot do and constraints that the cloud provider imposes on you. Until one day, that thing that was impossible is now possible and supported.

But people don’t keep up with that so they still operate under the model of what used to be. I still remember a year or so after they raised the global per-resource tag limit to 50, I was seeing references to only ten tags being allowed per resource in the AWS console because not even internal service teams are allowed to talk to each other over there, apparently. And if they can’t keep it straight internally, what hope to the rest of us have? It’s the same problem of once you get this knowledge solidified, it’s hard to keep current and adapt to things that are progressing. Especially in tech where things are advancing so rapidly and so quickly.

Lex: Yeah, I gather things are a little feudalistic over inside AWS, although I’ve never worked there, so I don’t know. But it’s also just so big. I mean, there’s just—like, do you even know all of the—like, I challenge you to go through the list of services. I bet you’re going to find when you don’t know about. You know, the AWS services. Maybe that’s a challenge I would lose, but it’s so hard to keep track of all this stuff with how fast it’s changing that I don’t blame people for not getting that.

Corey: I would agree. We’ve long since passed the point where I can talk incredibly convincingly about AWS services that do not exist and not get called out on it by AWS employees. Because who would just go and make something up like that? That would be psychotic. No one in the right mind would do it.

“Hi, I’m Corey, we haven’t met yet. But you’re going to remember this, whether I want you to or not because I make an impression on people. Oops.”

Lex: Yeah. Mr. AWS Snark. You’re exactly who I would expect to do that. And then there was Hunter, what’s his name? The guy who made the—[singing] these are the many services of AWS—song. That was pretty great, too.

Corey: Oh, yeah. Forrest Brazeal. He was great. I loved having him in the AWS community. And then he took a job, head of content over at Google Cloud. It’s, well, suddenly, you can’t very well make fun of AWS anymore, not without it taking a very different tone. So, I feel like that’s our collective loss.

Lex: Yeah, definitely. But yeah, I feel like we've done amazing things as a society, but the problem is that we’re still, like, at the level of, we don’t know how to program the VCR as far as, like, trying to run reliable services. It’s really hard to build a complex system that, by its nature of being useful for customers, it must increase in complexity. Trying to run that reliably is hugely difficult and trying to do so profitably is almost impossible.

And then I look at how hard that is and then I look at people trying to make self-driving cars. And I think that I will never set foot in one of those things until I see us getting good at running reliable services. Because if we can’t do this with all of these people involved, how do I expect that a little car is going to be—that they’re going to be able to produce a car that can drive and understand the complexities of navigating around and all the hazards that are involved to keep me safe.

Corey: It’s wild to me. The more I learned about the internet, the more surprised I am that any of it works at all. It’s like, “Well, at least you’re only using it for ridiculous things like cat pictures, right?” “Oh, no, no, no. We do emergency services and banking and insurance on top of that, too.” “Oh, good. I’m sure that won’t end horribly one day.”

Lex: Right? Yeah. I mean, you look at, like—you look at how much of a concerted effort towards safety they’ve had to put in, in the aviation industry to go from where they were in the ’70s and ’80s to where we are now where it’s so incredibly safe. We haven’t made that kind of full industry push toward reliability and safety. And it’s going to have to happen soon as more and more of the services we’re building are, exactly as you say, life-critical.

Corey: Yeah, the idea of having this stuff be life-critical means you have to take a very different approach to it than you do when you’re running, I don’t know, Twitter for Pets. Though, I probably need a new fake reference startup now that Twitter for reality is becoming more bizarre than anything I can make up. But the idea that, “Well, our ad network needs to have the same rigor and discipline applied to it as the life support system,” maybe that’s the wrong framing.

Lex: Or maybe it’s not. I keep finding instances of situations—maybe not necessarily ad networks, although I wouldn’t put it past them—but situations where a system that we’re dealing with becomes life-critical when we had no idea that it could possibly do. So, for example, a couple companies back, there was this billing situation where a vendor of ours accidentally nilled our customers incorrectly and wiped bank accounts, and real people were unable to make their mortgage payments and unable to, like, their bank accounts were empty, so they couldn’t buy food. Like, that’s starting to become life-critical and it all came down to a single, like, this could have been any outage at any company. And that’s going to happen more and more, I think.

Corey: I really want to thank you for taking time to speak with me. If people want to learn more, where’s the best place for them to find you?

Lex: sreweekly.com. You can subscribe there. Thank you so much for having me on. It has been a real treat.

Corey: It really has. You’ll have to come back and we’ll find other topics to talk about, I’m sure, in the very near future. Thank you so much for your time. I appreciate it.

Lex: Thanks.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.