Screaming in the Cloud

Insightful conversation. Less snark.

Dot Divider
Every Wednesday, listen to host Corey Quinn interview domain experts in the world of Cloud Computing discuss AWS, GCP, Azure, Oracle Cloud, and why businesses are coming to think about the Cloud.
Screaming in the Cloud Hero
re:Inventing re:Invent with Pete Cheslock
Screaming in the Cloud
04.22.2021
34 Minutes
About Pete

Pete is a recovering system administrator who got his start with AWS services back in 2009 while at Sonian, the first cloud-based email archiving platform. As one of the earliest and largest users of AWS, Pete ran technical operations and brought DevOps theory into action. Pete has worked for other companies such as Dyn, Threat Stack, and CHAOSSEARCH, managing large scale AWS deployments. A frequent speaker at DevOps and Observability events, Pete brings a product mindset to SaaS operations. Outside of work he spends his free time smoking meats and tweeting about the results.


Links:
Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: Join me on April 22nd at 1 PM ET for a webcast on Cloud & Kubernetes Failures & Successes in a Multi-everything World. I'll be joined by Fairwinds President Kendall Miller and their Solution Architect, Ivan Fetch. We’ll discuss the importance of gaining visibility into this multi-everything cloud native world. For more info and to register visit www.fairwinds.com/corey. 


Corey: The apps on cloud summit is a new action packed, not a conference, happening May 11th through 13th online. Its for everyone who makes applications in the cloud run screaming. From IT leaders to DevOps pros to you folks, whoever you might be. Take a break from screaming into the cloudy void with me to learn from some of the best of people who actually know what they’re doing. Like Kelsey Hightower, AWS blogger John Meyer, and also me, because apparently they didn’t listen to me saying I had no idea what I was doing. Register now at turbonomic.com/screaming. Theres a “swag box” ready to ship for the first two thousand registrants, so you don’t want to miss this. Thanks for Turbonomic for sponsoring this ridiculous podcast.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. For the third year in a row, I am joined by—who is now my colleague, Pete Cheslock. Pete, thanks for coming back.


Pete: It’s great to be here yet again, although under different circumstances than our normal post re:Invent extravaganza.


Corey: Oh, yes. So, every year, for those who have not been following this show since its inception, Pete and I get together to more or less kibitz around what happened at re:Invent. We would have done this in December, but then they put up on their website, three more days happening in January, and where, we’ll wait until after that happens. And guess what happened? Nothing. They did some additional breakout sessions and that’s it. So honestly, it was a giant waste of everyone’s time, kind of like the, you know, sponsor expo hall at a digital event.

Pete: Yeah. Can we start on that topic?

Corey: Do you want to start on the digital event aspect or the crappy expo hall?


Pete: I want to start with the expo hall because I am a former sponsor of re:Invent, many, many times over. Again, I’ve been very lucky. I’ve worked in the cloud world, so I’ve been to pretty much all the re:Invents except for one—I’m not sure we’re going to count this year—but for almost all of those re:Invents, I was in the expo hall as part of a sponsor, whether it was the company I was working at, or whatever, but we were either part of that process of setting up a booth and shilling our wares to all these tech folks that are there, and the experience has been different: everything from, like, you build your own booth; here’s a square, just put whatever you want there, which I think was hilarious in the early days. To the nope, just give us a picture of what you want behind your booth. This year, though, it’s a weird digital thing. I guess they sent some VR things around there. I’m not sure if you heard of that. Some of the—

Corey: Oh, they did. Only for the Heroes, not for anything sponsor expo hall stuff. Now, let’s begin and say that I have a fair bit of sympathy—kind of—because 2020, weird year, pandemic is not something you generally plan for when scheduling events out years in advance. And there we have it, where it’s, suddenly there’s really no other option. Now, AWS absolutely dragged its feet embarrassingly long before announcing it would be digital-only. I think it was October, when they finally said, “All right,”—or damn near it—“All right, it’s going to be a virtual event.” To which the rest of the industry said, “No kidding.” But until they actually come out and say that you can’t bank on it.


Pete: How do you plan for that? I mean, I remember just a few years ago going through the process while I was still at ChaosSearch to set up a sponsorship for re:Invent. People start that process in March, April timeframe; they start thinking about, you know, the strategizing it; they start locking in their deposits because the sooner you get a deposit in the earlier, you can pick your booth location and that’s a big part. You want it to be not in the way far back that no one can find you. You want it to be somewhere near a good walkway. And so there’s a lot of planning that goes in, so pushing it back so late. I mean, I don’t know, do you think that they had this belief that they were going to do an in-person event?


Corey: My honest belief, to be very frank—and I say this in my capacity as gadfly, not in my capacity as self-appointed head of marketing for AWS—is that I think that they just were facing a whole bunch of cancellation fees, because when you book something like that, it’s expensive. They’re frugal, and I feel like for once, they were on the side of a contract that there was no winning move to get out of. And I feel like there was just some bitterness around that, they were hoping for a miracle and finally had to face reality. That’s my gut feeling. I have no inside track on that because although I call myself the head of AWS marketing, they don’t agree. Which is fine, though because, given their messaging or lack of same, I don’t need their agreement to be effective in the role.

Pete: Exactly. I mean, I think what’s most impressive to see is that they still provided an event that people attended; people watched the videos, they took part in, they did a bunch of different, kind of, vehicles and using things like Twitch and this VR thing for the expo hall, as a little silly as it is, at least they tried, right? They tried something new in this new world that we have been living in.


Corey: It was definitely an experience. Because we’re starting with the expo hall, though, I did a virtual walkthrough of the expo hall, and I made fun of things as I typically do in the real one, but it was hard to find, a bunch of people didn’t realize it existed, three weeks in, and what shocked me was the sheer level of enthusiastic outreach I got from some of the sponsor booths I visited, I got phone calls from vendors ask if they can help me with various solutions. “No, no. I was just there to make fun of you.” At which point it’s a very surreal conversation for an account rep to have when they don’t know who I am and what my nonsense looks like. But it was, they had so few leads coming in that they were just really focusing on every one that showed up. And I feel for them.

Pete: Yeah.


Corey: Problem is that they paid top dollar for these things, and got—I got to be honest with you—remarkably little. The minimum buy was something like 35 grand for a tiny little booth, and it went up to 125 plus a whole bunch of extras. And I’m looking at this in my own re:Quinnvent sponsorship nonsense that worked super well for sponsors, and I’m sitting here going, “I really need to start charging more. My God.”

Pete: Yeah. We just think about that for a second is that in normal re:Invent, your booth size is priced differently. If you want a small booth, like, 10 by 10 foot, you’ll pay a certain amount of money, a 20 by 20 booth is a lot more money. It makes sense. It’s a square footage situation here, but we’re talking about, like, computer bits, right? They’re like, yeah, well, you can get the small digital booth for 30,000 or the large, double-decker digital booth for 150.


Corey: Oh, yeah. One of my favorite personal experiences, remember, I talked to a company about sponsoring. Their initial position is usually always the same. “You’re a jerk. You made fun of us on Twitter, you roasted us, why on earth would we ever pay you for sponsorships?”

And my response was, “Look, let me level it with you here. I get up there and make fun of you, and no one in the world is going to stop doing business with you because I said something snarky and sarcastic, but they absolutely will hear of you for the first time.” And then the penny drops. And it goes one of two ways. It’s either, “You’re an ass. No.” Or it’s, “Oh, my god, you’re right. Have some money.”

And it’s really an interesting experience watching that transformation take place. I used to think that I was, I don’t know, somehow fooling people with this. I’m not. It has the benefit of being completely true.

Pete: Yeah, I mean, sometimes all news is good news. Anything that you say, you’re just going to make a joke about someone’s product and—or their marketing strategy because every marketing strategy is a little stupid from time-to-time—but you look at and you go, “Yeah, this is pretty stupid.” And then you say that to other people, and they’re like, “Oh, I’ve never heard that company. What do they do?” And at the very least, it’s that opportunity for them to go to the site and be like, “I don’t know what that company does. Let me go look at it.”

And sure, I may spend about five seconds on your site, but you’ve got that free impression; you’ve got that opportunity. Like, I hope your site is good enough to have a clear statement of what you are because I’m going to give you about five seconds, but still, just mentioning it, I’m going to be like, “Oh, who are they? Let me go check it out.”

Corey: Well, the expo hall was on the other side of that. They had these mini-sites built up. My personal favorite was in week three, I went to a vendor and clicked on their mini-site, and it 404’ed because they were hosting, like, some virtual drink-up the week before, and when it was done, they just packed up their booth and left. You spent an awful lot of money not to drive me to your website. I just don’t pretend to understand marketing, at least that’s what I thought. It turns out that no, just some companies are really bad at it.

Pete: Yeah. And—


Corey: What I do is not, oh, for everyone. I get that. But it makes an otherwise dry subject area kind of fun. At least to me.

Pete: Yeah, I’ve always—I’m a weird person. So, I enjoy re:Invent most mostly because—re:Invent is what you make of it. It’s always been that case, even the very first re:Invent, which feels quaint by comparison to some of the more recent ones, where I think there was maybe 4000 people, 6000 people the first one? I’d love to find the answer to that one. It was small, though. It was very small.

And moving on to the more recent years of re:Invent how, just, big they’ve gotten. I’ve always liked the expo hall. I’ve liked walking around. I mean, granted, being in this industry, oftentimes many of my friends are working the booths as well. Like, I will be working the booth for a while, people will come by, and my friends will be working in a booth and I’ll stop by and say hello, but mostly it’s a really great opportunity to just see what’s out there.

There is so much stuff. It’s hard to follow everything around. I mean just following—right, Corey?—following just the Amazon ecosystem as a full-time job. Following the surround sound, it’s nearly impossible.

So, I’ve always liked the expo hall. I like walking around. I like to see what people are saying. I like to see things that are—what are they doing? What problems are they solving? And it’s a great way to just get that, kind of, streamlined process, you get to see so much in such a short amount of time.


Corey: It’s absolutely one of my favorite parts of the show is walk around the expo hall. It’s a natural meeting place for people. I’ve got to be honest, I don’t go to too many sessions just due to the fact that there’s better uses of my time than standing in line for two hours to make sure I get a seat. So, it’s a natural gathering point; it’s a great way to catch up with people you only get to see once a year. And you get to see what the zeitgeist is, what people are talking about.


You can see whose booth is slammed and who’s not. And you can fool yourself into thinking it’s about the quality of their product rather than the quality of their swag. But it’s an experience. And on the one hand, I’m sad to miss it. On the other, I had so many more productive conversations this year.


There were so many aspects of it that I don’t miss, like the conference crud where you get the flu every time, because you have a lot of people in a small space. And there’s the sense of having to fit everything into one week. Instead, they’ve now expanded it to three weeks, which on the one hand, okay, I actually like the fact that can be a more measured pace. On the other, exactly who do they believe can get three full weeks off from work to sit around and watch videos in a browser?


Pete: Yeah. That was a big concern when I started looking at some of the re:Invent stuff. And even for us, we follow a lot of the Amazon ecosystem, I didn’t even get to take part of a lot of this Amazon re:Invent activities as much as I wanted to. Because we have clients that we want to service and make sure that their needs are met. So, just for us who spend so much time in this ecosystem, we really can only dedicate so many people to understanding what all these changes are and staying up on it.


Corey: I will say that the one thing that tips my entire assessment event over into the positive, and I want to see aspects of this going forward, is how accessible the whole thing became where we suddenly have a scenario where it’s not just restricted to people who, one, can drop two grand—or damn near it—on a ticket; two, can afford to travel to and stay within Las Vegas for a week, and three, can get the time off from work to do it. Suddenly, the only prerequisite was ‘has an internet connection.’


Pete: [laugh]. It’s so true. I mean, let’s not kid anyone out there. It’s a boondoggle. It is one hundred percent a boondoggle. It is a week in Vegas. And I know there’s a bunch of people out there that are like, “Aww, I hate going to re:Invent. I hate Vegas.” And yeah, I can understand that some people just don’t enjoy it. I’m a weird person. I actually like Vegas. I’m weird.

But people go and they enjoy it because even if you hate all of the noise, and the smoke, and the gambling, and the whatever, and you have to walk an hour to get any place, the people there, though, and the connections that you can make—I mean, I meet new people every year at re:Invent which, at an event that is so large, kind of feels counterintuitive. Like it’s so large, you almost feel lost in a sea of people, but yet somehow it’s like, I still run into people. I meet up with someone for coffee, and they may be back to back with meetings, “Oh, do you know this person?” I get to meet new Amazon folks. And I get to actually see a lot of friends, and again, maybe we’re all just missing that personal connections of conferences. But on the flip side, I really enjoyed not having to go to Vegas this year. Even without a pandemic, it was really nice to not have to spend a week and come home sick, and tired, and exhausted.


Corey: There’s something amazing about being able to do it at your own pace. Now again, because Amazon is willing to be misunderstood for long periods of time, which is used as an excuse to completely abdicate any actual marketing work, for the most part, it means that I instead had to guess what was going to happen going in. So, all right, I committed to doing a daily email roundup four days of the week; I committed to doing a bunch of live streams and such. And what I didn’t realize was it was going to be a stop-start thing where there would be a whole bunch of releases one day, and then there’d be nothing the following day. So, I was sort of left on some of those empty days of kind of holding the bag of, “Here’s something you might not have caught yesterday”—and I’m digging deep into the barrels—“There was a minor change to an SDK in a language no one has ever heard of.”


And the fun, nice thing is about AWS is that people just assume someone’s going to care about that. They won’t, but, “Oh, that one just must not be for me.” So, it was a little challenging from a content management perspective. I would absolutely want to see that done differently or at least telegraphed in advance next year.


Pete: So, this is an interesting topic where, obviously, the last 12 months has been interesting for the conference world. I know a lot of more local conferences are, kind of, already writing off the year. Maybe some are holding off to see if, maybe, the end of the year, we reach enough vaccines and things get better. But I mean, Amazon had re:Invent. It was small, then it expanded. Then they broke out these summits, right? They had these city regional summits, then they did like—


Corey: And having gone to a bunch of those summits, it turns out, they’re all basically kind of the same thing. And I went to my third one back in 2019 or so, that year, and it was, “Hey, that’s the same joke in all the keynotes. What’s the”—and then it’s like, “Oh, right. Most people are not, you know, nuts, and don’t travel around the world, like some sort of ridiculous groupie for a rock band going to all of the AWS summits.” I have problems. But it’s fun.


Pete: Yeah, again, as someone who has worked in a lot of these booths and had to go to them, it’s always a little painful when you’re at the Santa Clara Convention Center—which, you know, is in the middle of nowhere, there’s nothing around, there’s nothing to do—and there’s like a Bennigan’s, I think you can go get some dinner at when you’re done for the day. And you finish up and you’re chatting with folks, and then you’re like, “Oh, so you get to be at the Toronto one?” Be like, “Oh, yeah, I’ll be there. I’ll see you in a few weeks.” Like, [laugh] it’s just, it’s the saddest thing.

Corey: Yeah, it really is. Then they also expanded beyond summits, though into re:Inforce, which is security. And that was a summer event in 2019 and they were hoping to do in 2020 and canceled it. And supposedly, it’s going to be coming back; we don’t know when. I like the idea of being able to break out the security-focused stuff into its own event because the biggest problem I’ve always had with re:Invent is that it doesn’t know what it wants to be.


Is it a thing that’s for new product releases? Is it a chance to have a bunch of executive briefings between big customer execs and Amazon folks? It’s a vendor expo hall where people get to shill their wares? Is it a partner gathering so the partners can learn how things work? What’s going on there? And what is it about? And is it a big party? Is it just, effectively, a chance for a bunch of people to get on stage and talk about what they’re working on? And the answer to all that is, “Yes. And more.”


Pete: Yeah, exactly. There is an identity problem for re:Invent. I think they have been doing a decent enough job of trying to break things out, to make, kind of, that re:Invent knowledge transfer, a little bit more accessible with—they’re all free, too, these regional summits, these are free events people can just go to and consume a lot of this content and workshops and things like that. The breaking out of the security stuff, I think was great. Hilariously, I had not been working at a security company when that happened, but I had heard rumors that the sponsorships were all invite-only. There are so many security companies that Amazon was like, “We can’t take all of your money for this, so we’re going to invite you in to sponsor it.” Because that market is so thirsty for sales.


Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.

Corey: What was also bizarre was that at re:Inforce’s expo hall, it was only security companies. And that was something I didn’t fully understand until you just said that. Because I was annoyed that, well, there’s no monitoring companies here or anything else. Does it not occur to folks that may be people who care about AWS security might also have other needs in the cloud computing space? There aren’t too many people who have blinders on that restrict them specifically to security and only security. “Oh, it does monitoring, too? I’m not interested.”

Pete: [laugh]. I think that’s got to be a hard challenge as a security vendor if you’re at an event—and I’ve never been to RSA, and RSA must be just the exact same way—but how many different companies there do, like, threat detection or, like, sims—

Corey: Most of them. It’s the same product with different logos on, as you walk up and down the halls. There are occasionally unique and interesting things, but they’re few and far between.

Pete: Yeah, exactly. So, I’m most curious to see where Amazon goes with re:Invent because I think what this year is giving them is an opportunity to maybe find a better identity for what do they want re:Invent to be. And look, if the answer is we want it to be a big celebration of everyone who uses this, whose job is impacted by it, who makes money off of it, who et cetera, et cetera, then great. Then that’s the event. But there are going to be people, like there might be partner network people, well, we don’t really want to go to the party, but we want to get all the updates because our businesses are fully dependent on this. So, then maybe, do they break that up? So, I don’t know, do they start breaking re:Invent out into more focused things like they did with security? Would it take away from the overall feeling? I don’t know.

Corey: It’s a weird problem, and I don’t know how to solve it because without understanding what the event is intended to be, is really hard to guide it. They get on stage and they say, “Re:Invent is really—it’s not a sales conference. It’s not a release confer—it’s an education conference.” Which is shorthand for we have absolutely no idea what this is.

Pete: It’s so true. I mean, the early re:Invents, we’re, “All right. Let’s hear the latest price cuts for Amazon. Tell us how much cheaper S3 is and tell us how much cheaper EC2 is.” It was like clockwork.

That was what it was about. And it was about new software releases. Weirdly enough, though, the software announcements they did back then were all things that you could get then. They were like, “Hey, we’re announcing these new instances, available today.” They almost didn’t have to say available today because like, of course. Why would you announce it if you can’t get it?

Corey: Right, it seems like half the releases that there are big headlines, was, “Available in preview.” And you know what that means. They’re setting it up for them to just get dragged whenever they pull a Timestream. And, “Yes, we’re announcing it in private preview. It will be available soon.”


Then two years go by. And it’s pretty clear when you see that kind of delay that something happened. And credit we’re due, because they’re Amazon, I prefer that they get it right before launching it rather than launching something that isn’t great, and then we’re stuck with it forever. But if you’re still that early on, don’t announce it. Announce things that aren’t vapor.

Pete: Yeah. And I’d like to think there’s a strategy to it, but there probably isn’t, it’s probably just more of a, “Well, this is a great way for us to identify other customers who might also want to use this and maybe they want to be part of the preview.” And I don’t know, it’s a little frustrating, I will say. But things come out, and you’re like, “I really want to use that.” And it’s like, “Yeah, no. That’s not for you.”

Corey: Yeah. And that’s the challenge, too I think, from a marketing perspective. They do so many different releases of what’s coming out, how they’re going to be talking to satellites in orbit, or talking to manufacturing floors, or whatever it is that they’re talking about, that every company, no matter who they are or what they do, look at that and think, “Huh. That’s not what we do, therefore, AWS is not for me.” And AWS, as a company, is basically an alien organism, compared to going down the path of any other company that can’t really walk and chew gum at the same time.

I don’t know, if Apple, for example, starts doing a big push into filling potholes as a primary function, I’d look at that and think, “Oh, okay. They’re pretty clearly not focused on the Mac, on some respect.” I mean, look, what they did to their laptops for years with the Thunderbolt, the touch bar, the crappy keyboards, et cetera. It’s, yeah, it’s clear that they can’t focus on the iPhone and the Mac at the same time effectively. Or they just hate their customers.


Don’t email me. But what I don’t see is any ability of companies other than Amazon, to be able to focus and execute across this many different things. It’s hard to contextualize. So, it’s very easy for the messaging takeaway to be, “It’s not for me.”

Pete: Yeah. And maybe that ties into the re:Invent identity problem because you’ve got Andy Jassy on stage talking about, look out for whatever this new service that’s going to ruin your Amazon bill. It’s like warehouse logistics stuff, which, yeah, cool. I’m sure that that solves a big problem in the industry, but I’m a DevOps engineer, and I want to hear more about EKS, right? And I have to sit through learning about this predictive, whatever for my warehouse that I don’t have. Does that just become too off-putting, and do I just then zone out and, kind of, ignore all these other interesting things that could be happening?


Corey: It’s unclear. And that’s the biggest problem, I think, that they’re failing to educate people on. Specifically, every service is for someone. No service is for everyone. And that is a difficult thing to hold on to.


We’ve long since passed the point where anyone can hold all the services in their head, we’ve gotten to a point where even I don’t always pick up a fake service someone slips in to see if I know if it exists or not. It’s expanded too far too quickly and where, that’s fine. But the messaging strategy has to change, the marketing strategy has to change, your entire go-to-market has to change.


Pete: Yeah. Amazon is really good at running things. I mean, that’s what they’re good at. It’s operationalizing software, and they continue to find things that people don’t want to run anymore. I don’t blame them.


I don’t want to run things. I’m a cloud economist. I look at bills. I don’t want to run Elasticsearch anymore. I don’t want to deal with Cassandra and get paged at 2 a.m. like, I really want someone else to deal with that stuff. And just think about all of the other verticals, all the other businesses that exist out there with the same people who are having the same complaints, just insert different words. Like, “Oh, I really hate my business intelligence solution. I really hate these Excel spreadsheets that are always locked. There must be a better way.” Right? And it turns out Amazon’s like, “Yeah I got you.”


Corey: Yeah, the idea that Amazon is equally good across all of these different offerings is a bit of a red herring. There are things that they excel at, and they’re things that they struggle at. And I often shorthand that to the infrastructure pieces, the plumbing, they’re phenomenal at. Anything that requires a user interface, or is SaaS, they are hilariously bad at—Honeycode—and most things are somewhere on the spectrum between those two points. And there are exceptions in both directions, but by and large, the more it looks like a big computer rented by the hour, the better the offering is. Would you agree or disagree with that?


Pete: Yeah, I definitely agree. I think the thing that was most surprising in the recent Kinesis outage was just how intertwined Amazon services are internally—AWS services—and how internally, the engineers at AWS are building on top of AWS. It’s a weird Russian nesting doll issue, where it’s just turtles on turtles. And it’s fascinating, and I wonder if the services which are most used internally, as well become those services that are the most stable, the most well-supported, most features coming in? Does Amazon build for Amazon first, and therefore, put a lot of effort into those things that further supports their business and maybe grows revenue? Maybe.


If they’re building for the customer, and they consider themselves a big customer, then theoretically, they’re building as well for some of those features. But of course, they build for everyone; they build for the startup that has two EC2 servers, and then they build for the federal government who wants to beam some bits using Ground Station. Like, who else is going to use that service?


Corey: Yeah. It feels like there’s like five companies out there that might need it, and the rest of us are, “Yeah, I don’t currently have any satellites in orbit this quarter that I need to speak to, so it’s probably not for me.” I will say that every time I meet someone who’s about to go to AWS as an employee, they’re super excited because they’re going to see how it works internally and come out understanding of this Google-like system that is decades ahead of anything else on how they run their stuff operationally. And then a few months go by, and I catch up with them again, and they look haunted. There is no enthusiasm for it at all. Their voice shakes; they tremble a bit; frequently, they’ve developed the drinking problem. And they don’t ever talk about it, but what I’ve managed to piece together is there’s no magic secret sauce. It’s the same nonsense that you would see anywhere else, but they excel at the operational aspects of all of it. And that’s what makes it work.

Pete: I think what actually happens is they find the truth of the m1.medium and the first EC2 instance, and they’re so horrified that those are still running, that they can just not come back from the brink.

Corey: I didn’t know it was a Raspberry Pi.


Pete: [laugh]. I think you are totally right. I mean, every place, largely, is the same. It’s just, it’s got its own history that has framed how everything is. And if you’re on the inside, and especially if you’ve been at Amazon for many years, you’re financially incentivized to love that place.


I mean, if I had a lot of stock grants that were granted many years ago, and the stock continues to climb, like yeah, this is the best place I’ve ever been like, “What are you talking about?” As they look around and everything is on fire, or who knows what.


Corey: As they walk past the conference room filled with people crying? Yeah.


Pete: You know, it’s… I don’t know, what is it? Stockholm Syndrome? Is that the term? You just accept it, and you get used to it, and you get comfortable with it. And yeah, in rare scenarios, there’s folks that I know that have been there for a long time, and they’re there—I hate to say they’re there for the mission. They’re not. They’re there because of the challenge; because technically what they get to work on is cutting edge.

I don’t know if that’s the case for every new service and feature. If you were someone who was working on making QuickSite graphs look better versus, like, the Nitro Hypervisor, maybe depending on who you are, one of those is more thrilling than the others. I don’t really know. But obviously, there seems to be two types of Amazon employee: one who sticks around for their year, gets that bonus, or at least doesn’t get the clawback they need; and the others who stick around for many, many, many years, right? It’s, I think, a different company, depending on who you are.


Corey: That is increasingly the vibe I’m getting from feedback I’ve gotten to blog posts, people yelling at me, people saying that, oh, my assessment of how compensation works at Amazon is either spot on or completely inaccurate. And both of those groups are being fully sincere when they say it. But that’s a conversation for another time.


Pete, thank you for joining me. We will do a second episode in the very near future talking about the actual releases of re:Invent 2020, but thank you for joining me. For those who are unfamiliar with your amazing work. Where can they find you?


Pete: You can find me at @petecheslock on Twitter; it’s probably the best place. It is a mixture of smoked meats and technology hot takes.

Corey: Thank you. As always, it’s a pleasure. Pete Cheslock, cloud economist at the Duckbill Group.

Pete: That’s me.


Corey: I’m Cloud Economist Corey Quinn, also at the Duckbill Group, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment telling me that I really don’t understand Amazon’s marketing approach, and that no, I don’t really run AWS marketing.


Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.


This has been a HumblePod production. Stay humble.
Play Episode
S3: 15 Years and 100 Trillion Objects Later with Kevin Miller
Screaming in the Cloud
04.20.2021
37 Minutes
About KevinKevin Miller is currently the global General Manager for Amazon Simple Storage Service (S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Prior to this role, Kevin has had multiple leadership roles within AWS, including as the General Manager for Amazon S3 Glacier, Director of Engineering for AWS Virtual Private Cloud, and engineering leader for AWS Virtual Private Network and AWS Direct Connect. Kevin was also Technical Advisor to Charlie Bell, Senior Vice President for AWS Utility Computing. Kevin is a graduate of Carnegie Mellon University with a Bachelor of Science in Computer Science.


Links:
TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: Join me on April 22nd at 1 PM ET for a webcast on Cloud & Kubernetes Failures & Successes in a Multi-everything World. I'll be joined by Fairwinds President Kendall Miller and their Solution Architect, Ivan Fetch. We’ll discuss the importance of gaining visibility into this multi-everything cloud native world. For more info and to register visit www.fairwinds.com/corey.

Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit www.lacework.com


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Kevin Miller, who’s currently the general manager for Amazon S3 which presumably needs no introduction itself, but there’s always someone. Kevin, welcome to the show. Thanks for joining us, and what is S3?


Kevin: Well, Corey, thanks for having me. Yes, Amazon S3 was actually the first generally available AWS service. We actually just celebrated our 15-year anniversary here on Pi Day, 3/14. And S3 is an object storage service that makes it easy for customers to put and store any amount of data that they want. We operate in all AWS regions worldwide, and we have a number of features to help customers manage their storage at scale because scalability is really one of the core building blocks, tenets for S3, where we provide the ability for customers to scale up and scale down the amount of storage they use, they don’t have to pre-provision storage, and when they delete objects that they don’t need, they stopped paying for them immediately.


So, we just make it easy for customers to store whenever they need, access it from applications, whether those are applications running in AWS or somewhere else on the internet, and really just want to make it super easy for customers to build storage, use storage with their applications.


Corey: So, a previous guest in, I say the first quarter of the show’s life—as of this time—was Mai-Lan Tomsen Bukovec, who at the time was also the general manager of S3, and she has since ascended to, perhaps, S4 or complex storage service. And you have transitioned from a role where you were the general manager of Glacier—


Kevin: Correct.


Corey: —or Amazon S3 Glacier, and that’s the point of the question. Is Glacier part of S3? Is it something distinct? I know they’re tightly related, but it always seems that it’s almost like the particle-wave experiment in physics where, “is it part of S3 or is it a distinct service?” depends entirely on the angle you’re looking at it through?


Kevin: Right. Well, that’s—Amazon S3 Glacier is a business that we run as a separate business, with a general manager. Joe Fitzgerald looks after that business today. Certainly, most of our customers use Glacier through S3, so they can put data into S3 and they actually can put it directly into the Glacier storage class—or the Glacier Deep Archive storage class—or customers can configure lifecycle policies to move data into Glacier at a certain point. So, the primary interface customers use is through S3, but it is run as a standalone business because there’s just a set of technology and human decisions that need to be made, specific to that type of storage, that archive storage. So, I work very closely with Joe, he and I are peers, but they are run as separate businesses.


Corey: So you, of course, transitioned. You’ve I guess, we’ll say that you’ve thawed. You are no longer the GM of Glacier, you’re now the GM of S3. And you just had a somewhat big announcement to celebrate that 15-year anniversary of S3 Object Lambda.


Kevin: Yes. We’re very excited about S3 Object Lambda. And we’ve spoken to a number of customers who were looking for features with S3, and the way that they described it was that they liked the S3 API, they want to access their data through that standard API, there’s lots of software that knows how to use that including, obviously, the AWS SDK. And so they liked that GET interface to get data out and to put data in, but they wanted a way to change the data a little bit as it was being retrieved. And there’s a lot of use cases for why they wanted to do it.

Everything from redacting certain data to maybe changing the size of an image for particular workloads, or maybe they have a large amount of XML data and for certain applications, they want a JSON formatted input. And so rather than have a lot of complicated business logic to do that, they said, well, why can’t I just put something in the path so that as the data is being retrieved through the GET API, I can make that change, the data can be reformatted.


Corey: It’s similar to the [email protected] approach where instead of having to change or modify the source constantly and have every possible permutation, just operate on the request.


Kevin: Yeah, that’s right. So, I want one copy of my data; I don’t want to have to create lots of derivative copies of it. But I want to be able to make changes to it as it’s going through the APIs. So, that’s what we built is Lambda, it’s integrated with Lambda, it’s full Lambda. So really, it’s pretty powerful.


Customers can do anything you can do in a Lambda function you can do in these functions that are then run. So, an application makes a GET request, that invokes the Lambda function, the function can process the data, and then whatever is returned out is then sent and streamed back to the application. So, customers can build some transformation logic that runs in line with that request, but then transforms that data that goes to applications.

Corey: So, at the time that we’re recording this, the announcement is hours old. This is not something that has had time yet to permeate the ecosystem; people are still working through the various implications of it, so it may very well be that this winds up aging before we can even turn the episode around. But what is the most horrifying use case of this that you’ve seen so far? Because I’m looking at this and I’m thinking, “Oh, you know what I can use this for?” People are thinking, “Oh, a database?” “No, that’s what Route 53 is. Now, I can use S3 as a messaging queue.”

Kevin: Well, possibly. I keep saying that I’m going to use it as a random number generator. But that was—yeah—

Corey: I thought that was the bill.


Kevin: [laugh]. Not quite. We have a lot of use cases that we’re hearing and seeing already in just the first few hours for it. I don’t know that I would call any super-horrifying. But we have everything from what I was saying in terms of redaction and image transformation to one of the things that I think a lot of—will be great will be using it to prepare files for ML training.

I’ve actually done some work with training machine learning models, and oftentimes, there’s just little things you have to tweak in the data. Sometimes you get a row that has an extra piece of data in it that you didn’t expect or it’s missing a field, and that causes the training job to fail. So, just being able to kind of cleanse data and get it ready to feed into an ML training model, that seems like a really interesting use case as well.


Corey: Increasingly, it’s starting to seem like S3’s the biggest challenge over the past 15 years of evolution has been that it was poorly named because it’s EC2 look at this now and come away with the idea that it’s not simple. And if you take a look at what it does, it’s very clearly not. I mean, the idea of having storage that increases linearly, as far as cost goes—you’re billed for what you use, without having to pre-provision a storage appliance at a petabyte at a time and buy a number of shells. “Ooh, if I add one more the vendor discount kicks in, so I may as well over-provision there.” “Oh, we’re running low. Now, we have to panic order and get some more in.” I’ve always said that S3 has infinite storage because it does. It turns out, you folks can provision storage added to S3 faster than I can fill it, I suspect because you just get the drives on Amazon.

Kevin: Well, it’s a little bit more complicated than that. I mean, I think, Corey, that’s a place that you rightly call out. When we say ‘simple storage service,’ although there’s so much functionality in S3 today, I think we go back to some of the core tenets of S3 around the simplicity, and scalability, and resiliency, and those are not easy. There’s a lot of time spent within the team just making sure that we have the capacity, managing the supply chain to a deep level; it’s a little bit harder than just clicking ‘buy now.’ But we have teams that focus on that and do a great job, and also just around looking around corners and identifying how we continue to raise the bar for resiliency, and security, and durability of the service.


So, there’s just, yeah, there’s a lot of work that goes into that. But I do think it goes back to that simplicity of being able to scale up and scale down makes it just really nice to build applications. And now with the ability to build serverless applications where you have you have the ability to put a little code there in the request path so that you don’t have to have complicated business logic in an application. We think that that is still, it’s a simple capability. It goes back to how do we make it EC2 build applications that are integrated with storage?

Corey: Does S3 Object Lambda integrate with all different storage tiers? Is it something that only works on standard? Does it work with infrequent access? Does it work with, for example, the one that still exists, but no one ever talks about: Reduced Redundancy Storage? Does it work with Glacier? Just, it sits there and that thing spins for an awfully long time.

Kevin: It will work with all storage classes, yes. With Glacier you would have to restore an object first and then it would. So, you’d issue the restore initially, although the Lambda function itself could also issue the restore. Then you would most likely then come back for a second request later to retrieve the data from Glacier once it’s been restored. But it does work with S3 Standard, S3 Intelligent Tiering, SIA, and any other storage classes.

Corey: I think my favorite part of all of this is that the interaction model for any code that’s accessing stuff in S3 doesn’t change. It is strictly a talk to the endpoint, make a typical S3 GET and everything that happens on the backend of that is transparent to your application.

Kevin: Exactly. And that was, again, if you go back to the simplicity, how do we make this simple, we said, “Customers love just that simple API.” It’s a GET API, and how do we make it so that that API continues to work, and applications that know how to use a GET, they can continue to use a GET and retrieve the data. But the data will be transformed for them before it comes back.

Corey: Are there any boundaries around what else that Object Lambda is going to be able to talk to? Is it only able to do internal massaging of the data that it sees? Is it going to be able to call out to other services? How extensible is this?

Kevin: The Lambda can do, essentially, whatever a Lambda function can do, including all the different languages. And then also, yeah, it can call out to DynamoDB, for example, if you want to, for example, let’s say you have a CSV file and you want to augment that CSV with an extra piece of data, where you’re looking it up in a DynamoDB table, you can do that. So, you can merge multiple data streams together, you can dip out to an external database to add to that data. It’s pretty flexible there.

Corey: So, at some level, what you’re realistically saying here is that until now, S3 has been able to be configured as a static website hosting facility; now it can also host dynamic websites.

Kevin: Well, S3 Object Lambda today will work with applications that are running within the customer’s account or where they’ve granted access through another account. We don’t support S3 Object Lambda directly as a public website endpoint at this point, so that’s something that we’re definitely listening to feedback from customers on.


Corey: Can I put CloudFront in front of it, and then that can invoke the GET endpoint?


Kevin: Today, you can’t, but that is also something that we’re—we’ve heard from a few use cases. But primarily, the use cases that we’re focused on right now are ones where it’s applications running within the account or within a peer account.

Corey: I was hoping to effectively re-implement WordPress on top of S3. Now, again, not all use cases are valid, or good, or something anyone should do, but that’s most of the ways I tend to approach architecture. I tend to live my life as a warning to others, whenever I get the opportunity.

Kevin: Yeah. [laugh]. I don’t respond to that, Corey. [laugh].


Corey: That’s fine, you don’t need to. So, one thing that was also discussed is that this is the 15-year anniversary, and the service has changed an awful lot during that time. In fact, I will call our, for really no other reason than to be a small petty man, that the very first AWS service in beta was SQS. Someone’s going to win a bar trivia night on that, someday.


Kevin: That’s right.


Corey: But S3 was the first to general availability because obviously, a message queue was needed before storage. And let’s face it, as well, that most people even if they’re not in the space can instinctively wrap their heads around what storage is; a message queue requires a little bit more explanation. But that’s okay, we will do the revisionist history thing, and that’s fine. But it’s evolved beyond that. It had some features that again, are still supported but not advertised.


The Reduced Redundancy Storage is still available, but not talked about. And there’s no economic incentive for doing it, so people should not be using it, I will make that declaration on my part, so you don’t have to. But you can still talk to it using SOAP calls, in the regions where that existed, via XML, which is the One True Data Interchange Format, because I want everyone mad at me. You can still use the, we’ll call it legacy because I don’t believe it’s supporting new regions, the BitTorrent interface for S3 data. A lot of these were really neat when it came out and far future, and they didn’t pan out for one reason or another, but they’re still there. There’s been no change since launch that I’m aware of that suddenly breaks if you’re using S3 and have just gone on walkabout for the last 15 years. Is that correct?


Kevin: You’re right. There’s functionality that we had from early on in S3 that’s still supported. And I think that speaks to the way we think about the service, which is that when a customer starts adopting it, even for features like BitTorrent, which certainly that’s not a feature that is as widely adopted as most of them. But there are customers that use it and so our philosophy is that we continue fully supporting it and helping those customers with that protocol. And if they are looking to do something different, then will help them find a different alternative to it.


But, yeah, the only other thing that I would highlight is just that there have been some changes to the TLS protocols we’ve supported over time, and that’s been something we’ve closely worked with customers to manage those transitions to make sure that we’re hitting the right security benchmarks in terms of the TLS protocol support.


Corey: It’s hard on some level also to talk about S3 without someone going, “Oh, what about that time in 2017 when S3 went down?” Now, I’m going to caveat that before we begin in that, one, it went down in a single region, not globally. To my understanding, the ability to provision new buckets was impacted during the outage, but things hosted elsewhere would have been fine. Everything depends, inherently, on S3 on some level, and that sort of leads to a cascade effect where other things were super wonky for a while. But since then, AWS has been remarkably public about what changed and how things have changed.


I think you mentioned during the keynote at re:Invent, or re:Invent two years ago, that there’s now something like 235 microservices at the time, that power S3 under the hood, which of course, every startup in the world looked at that and said. “Oh, a challenge. We can beat that.” Like they’re somehow Pokemon, and you’ve got to implement at least that many to be a real service. I digress. A lot changed under the hood, to my understanding, almost a complete rewrite, but the customer experience didn’t.


Kevin: Yeah, I think that’s right, Corey. And we are constantly evolving the services that underlie S3. And over the 15 years, that’s been, maybe, the only constant has been the change in the services. And those services change and improve based on the lessons we’ve learned and new bars that we want to hit. And I think one really good example of that it is the launch of S3 Strong Consistency in December of last year. And Strong Consistency, for folks who have used S3 for a long time, that was a very significant change.


Corey: Oh, it was a bi-modal distribution, as far as the response to that. The response was either, “What does that even mean, and why would I care?”

Kevin: Right.


Corey: And the other type of response was people dropping their coffee cup in shock when they heard it.


Kevin: It’s a very significant change. And obviously, we delivered that to all requests, to all buckets was no change to performance and no additional costs. So, it was just something that everyone who uses S3 and—today or in the future—got for free, essentially no additional charge.

Corey: What does Strong Consistency mean, and why is that important, other than as an impressive feat of technical engineering?


Kevin: Right. So, in the original implementation of S3, you could overwrite one object but still receive the initial version of an object in response to a GET request. So, that’s what we call eventual consistency where there can be, generally a short period of time, but some period of time where a subsequent write would not be reflected in a GET request. And so with Strong Consistency, now, the guarantee we provide is that as soon as you receive a 200 response on a PUT request, then all subsequent GET requests and all subsequent LIST requests will include that most recent object version, the most recent version of the data that you’ve provided for that object.


And that’s just an important change because there’s plenty of applications that rely on that idea of I’ve PUT the data and now I’m guaranteed to get the exact data that I’ve PUT in response, versus getting an older version of that data.


Corey: There’s a lot that goes into that, and it’s deceptively complicated because someone thinks about that in the context of a single computer writing to disk—“Well, why is that hard? I edit a file. Then I talk to that file, and my edits are in that file.” Yeah. Distributed systems don’t quite work that way.


And now imagine this at the scale of S3. It was announced in a blog post at the start of this week that 100 trillion objects are stored in S3. That’s something like 16,000 per person alive today. And that is massive. And part of me does wonder how many of those are people doing absolutely horrifying things, but it’s a—customer use cases are weird. There’s no way around that.


Kevin: That’s right. North of 100 trillion objects. I think, actually, 99 trillion are cat pictures that you’ve uploaded, Corey, but—


Corey: Oh, almost certainly. Then I use them as a database. The mood of the cat is how we wind up doing this. It’s not just for sentiment analysis; it’s sentiment-driven.


Kevin: Yeah, that’s right. That’s right. But yes, S3 is a very large distributed system, and so maintaining consistent state across a large distributed system requires very careful protocols. There’s actually, one of the things we talked about this week, that I think it’s pretty interesting about the way that internal engineering in S3 has changed over the last few years, is that we’ve actually been using formal logic and mathematical proofs to actually prove the correctness of our consistency algorithms. So, the team spent a lot of time engineering the consistency services, and all the services that had to change to make consistency work.


Now, there’s a lot of testing that went into it, kind of traditional engineering testing, but then on top of that, we brought in mathematicians, basically, to do formal proofs of the protocols. And they found edge cases. I mean, some of the most esoteric edge cases you can imagine, but—


Corey: But it’s not just startups that are using this stuff, it’s hospitals. Those edge cases need to not exist if you’re going to make guarantees around things like this.


Kevin: That’s right. And you just have to make sure. And it’s hard; they did painstaking work to test, but with our formal logic, we’re able to just to simulate billions of combinations of messages and updates that we’re able to then validate that the correct things are happening relative to consistency. So, there’s a very significant engineering work, it was a multi-year effort, really, to get Strong Consistency to the point it was. But just to go back to your earlier point, that’s just an example of how S3 really has changed under the hood, but the external API, it’s still the external API. So, that’s our north star on all of this work.


Corey: Incidents happen fast, but they don’t come out of nowhere. If they’re watching, your team can catch the sudden shifts in performance, but who has time to constantly check thousands of hosts, services, and containers?That’s where New Relic Lookout comes in. Part of Full-Stack Observability, it compares current performance to past performance, then displays it in an estate-wide view of your whole system.


Sign up for free at NewRelic.com and start moving faster than ever


Corey: So, you’ve effectively rebuilt the entire car while hurtling down the freeway at 60—or if you’re like me, 85—but it still works the same way. There are some things as a result that you’re not able to change. So, if you woke up, alternate timeline, you knew then what you know now, how would you change the interface? Or what one-way doors did you go through when building S3 early on in its history that in hindsight you would have treated differently?


Kevin: Well, I think that for the customers who used S3 in the very early days, there was an originally this idea that S3 buckets would be global, actually, global in scope. And we realized pretty early on that what we really wanted was regional isolation. And so today, when you create a bucket, you create a bucket in a specific region and that’s the only place that that data is stored. It’s stored in that region. Of course, it’s stored across three physically diverse data centers within that region to provide durability and availability, but it’s stored entirely within that region.

And I think in hindsight, I think if we had known, initially, that we would have moved into that regional model, we may have thought a little bit differently about how buckets are named, for example. But where we are now, we definitely like the regional resiliency, I think that’s a model that has proven itself time and time again, that having that regional resiliency is critical. And customers really appreciate that.

Corey: Something I want to talk about speaks directly to the heart of that resiliency, and the, frankly, ridiculous level of durability and availability the service offers, we’ve had you get on stage talking about these things, we’ve had Mai-Lan several times on stage talking about these things, and Jeff Barr writes blog posts on all of these things. I’m going to go out in the limb and guess that there’s more than just the three of you building this.

Kevin: Oh, yeah.


Corey: What’s involved keeping this site up and running? Who are the people that we don’t get to see? What are they doing?

Kevin: Well, there’s large engineering teams responsible for S3, of course, and they, I would say, in many ways are the unsung heroes of delivering the services that we do. Of course, you know, we get to be on stage and talking about these cool new features, but it’s only with a ton of hard work about the engineering teams day in and day out. And a lot of it is having the right instrumentation, and monitoring the health of the service to an incredibly deep level. It’s down very deep into hardware, of course, very deep into software, and getting all those signals and then making sure that every day, we’re doing the right set of things, both in terms of work that has to be done today, and project work that will help us deliver step-functions improvements, whether it’s adding another degree of availability, or looking at just certain types of data and certain edge cases that we want to strengthen our posture around, there’s constant work to look around corners, and then really just to continuously raise the bar for availability, and resiliency, and durability within the service.


Corey: It almost feels, on some level, like the most interesting changes and the enhancements that come out, almost always without comment, come from the strangest moments. I mean, I remember having a meeting with a couple of folks a year or two ago, when I was—I kept smacking into a particular challenge; I didn’t understand that there was an owner ACL at the time, and it turned out that there were two challenges there. One was that I didn’t fully understand what I was looking at, so people took my bug report more seriously than it probably deserved. And to be clear, no one was ever anything professional on this. And we had a conversation, my understanding dramatically improved, but the second part was a while later, “Oh, yeah. Now, with S3, you can also set an ACL that determines that any object placed into the bucket now has an ownership ID of the bucket owner.”


And I care about that primarily because that directly impacts the cost and usage reports that are what my company spends most of our life staring into. But it made for such an easier time as far as what we have to deploy to customer accounts and how we went up thinking about these things. And it was just a quiet release that was like many others with the same lack of fanfare that, “Oh, the service you don’t use is now available in a region you’ve never heard of. Have fun.” And there are, I think, almost 3000 of various releases last year; this was one of them that move the needle.

It’s little things like that, but it’s not so little because doing anything like this at the scale of something like S3 is massive. People who have worked in very small environments don’t really appreciate it. People who have worked in much larger environments—like, the larger the environment you get to work within the more magical something like this seems.


Kevin: Yeah, I think it’s a good example, you point to the S3 object ownership example, I think that’s a great example of the kind of feature that took us actually quite a bit of work to figure out how we would deliver that in as simple a fashion as possible. That was actually a feature that, at one point, I think there was a 2 or 3D matrix being developed of different ways that we might have to have flags on objects. And we just kept pushing and pushing to say, “It has to be simpler. We have to make this easier to use.” And I think we ended up in a really good spot. And it certainly, for customers that have lots of accounts, which I would say almost all of our large customers end up with many, many accounts—


Corey: Well, we’d like to hope so anyway. There was a time where, “Oh, just one per customer is fine.” And then you got to redefine what ‘large account’ looked like a few times that it was, “Okay, let’s see how this evolves.” Again, the things you learn from customers as you go.

Kevin: Yeah, exactly. And then there’s lots of reasons for different teams, different projects, and so forth, where you have lots of accounts. But for any of those, kind of, large accounts scenarios, or large organization scenarios, there’s almost always cases where you’re writing data across accounts in different buckets. So certainly, that’s a feature that, for folks who use S3, they knew exactly how they were going to use it, turned it on right away.


Corey: It’s the constant quiet source of improvement that is just phenomenal. The argument I always made that I think is one of the most magical parts of cloud that isn’t really talked about is that if I go ahead and I build an environment and I put it in AWS, it’s going to be more durable, arguably more secure, and better run and maintained five years later, if I never touch it again, whereas if I try that in a data center, the raccoons will carry the equipment off into the wilderness right around year three. And that’s something that is generally not widely understood until people have worked extensively with it.


S3 is also one of those things that I find is a very early and very defining moment, when companies look at going through either a cloud migration or a digital transformation, if people will pardon me using the term that I love making fun of, it’s a good metric for how cloud-y for lack of a better term, is your application and your environment. If everything lives on disks attached to instances, well, not very; you’ve just more or less replicated your data center environment into a cloud, which is fine as a step one. It’s not the most efficient, it makes the cloud look a lot more like your data center, and you’re not leveraging a lot of the capability there. Object storage is one of the first things that seems to shift, and one of the big accelerators or drags on adoption always seems like it comes down to how the staff think about those things. What do you see around that?

Kevin: Yeah. I think that’s right, Corey. I think that it’s super exciting to me working with customers that are looking to transform their business because oftentimes it goes right down to the data in terms of, what data am I collecting? What can I do with that data to make better decisions and make more real-time decisions that actually have meaningful impact on my business? And you talk about modern applications, some of it is about developing new modern applications and maybe even applications that open up new lines of business for a customer.

But then we have other customers who also use data and analytics to reduce costs and to better manage their manufacturing or other facilities. We have one customer who runs paper mills, and they were able to use data in S3 and analytics on top of it, to optimize how fast the paper mills run to eliminate the machines or reduce the amount of time that machines are down because they get jammed. And so it’s examples like that, where customers are able to first off, using S3 and using AWS able to just store a lot more than they’ve ever thought they could in a traditional on-premises installation, and then on top of that really make better use of that data to drive their business. And I mean, that’s super exciting to me, but I think you’re right as well about the people side of it. I mean, that is a, I think an area that is really underappreciated in terms of the amount of change and the amount of growth that is possible and yet really untapped at this point.

Corey: On some level, it almost shifts into—and again, this is understandable. I’m not criticizing anyone, I want to be clear here. Lord knows I’ve been there myself—where people start to identify the technology that they work with, as a part of their identity of who they are, professionally or in some cases personally.


Kevin: Yep.

Corey: And it’s an easy misstep to make. If there were suddenly a giant pile of reasons that everyone should migrate back to data centers, my first instinct would be to resist that, regardless of the merits of that argument because well, I’ve spent the last four years getting super deep into the world of AWS. Well, isn’t that my identity now on some level, so I should absolutely advocate for everything to be in AWS at all times. And that’s just not true; it’s never true, but every time it’s a hard step to make, psychologically.

Kevin: Oh, I agree. I think it is, psychologically, a hard step to make. And I think people get used to working with the technology that they do. And change can always be scary. I mean, certainly for myself as well, just in circumstances, where you say, “Well, I don’t know. It’s uncertain; I don’t know if I’m going to be successful at it.”

But I firmly believe that everyone at their core is interested in growth and developing, and doing more tomorrow than they did yesterday. And sometimes it’s not obvious. Sometimes it can be frightening, as I said, but I do think that fundamentally people like to grow. And so I think with the transformation that’s ongoing in terms of moving towards more cloud environments, and then, again, transforming the business on top of that, to really think about IT differently, think about technology differently. I just think there’s tremendous opportunity for folks to grow; people who are maintaining current systems to grow and develop new skills to maintain cloud systems or to build cloud applications even. So, I just think that’s an incredibly untapped portion of the market in terms of providing the training, and the skills and support to transform the culture and the people to have the skills for tomorrow’s environments.


Corey: Thank you so much for taking the time to speak with me about this dizzying array of things that S3 has been doing. What you’ve been up to for the last 15 years, which is always a weird question. “What have you been up to for the last 15 years, anyway?” But usually in a much more accusatory tone. If people want to learn more about what you’re up to, how you’re thinking about these things, okay can they find you?

Kevin: Well, I mean, obviously, they can find the S3 website at aws.amazon.com/S3. But there’s a number of videos on Twitch and YouTube both of myself and many of the folks within the team. Really, we’re excited to share a lot of new material. This week, with our Pi Week we decided Pi Day was not enough; we would extend it to be a four-day event. So, all week we’ve been sharing a ton of information, including some deep dives with some of the principal engineers that really help build S3 and deliver on that higher bar for availability, and durability, and security. And so, they’ve been sharing a little bit of behind-the-scenes, as well as just a number of videos on S3 and the innards there. So, really invite folks to check that out. And otherwise, my [inbox 00:32:54] is always open as well.


Corey: And of course, I would be remiss if I didn’t point out that I just did a quick check, and you have what can only be described as a sarcastic number of job openings within the S3 organization of all kinds of different roles.


Kevin: That’s right. I mean, we’re always hiring software engineers, and then systems development engineers in particular, as well as product management—


Corey: And TPMS, and, you know, of course, I’m assuming naming analysts. Like, “How do we keep it ‘S3,’ but not call it ‘simple’ anymore?” Let me spoil that one for someone: serverless. You call it serverless storage service, and you’re there. Everyone wins. You ride the hype train, everyone’s happy.


Kevin: I’m going to write that up right now, Corey. It’s a good idea.


Corey: Exactly. Well, we find a way to turn that story into six pages, but that’s a separate problem.


Kevin: That’s right.


Corey: Thank you so much for taking the time to speak with me. I really appreciate it.

Kevin: Likewise. It’s been great to chat. Thanks, Corey.


Corey: Kevin Miller, General Manager of Amazon Simple Storage Service, better known as S3. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment that whenever someone tries to retrieve it, we’ll have an Object Lambda rewrite it as something uplifting and positive.


Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.


This has been a HumblePod production. Stay humble.
Play Episode
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss
Screaming in the Cloud
04.15.2021
36 Minutes
About LaurieLaurie has been a web developer for 25 years and cares deeply about making the web bigger and better for everyone. He previously co-founded awe.sm and npm, and is currently a Senior Data Analyst at Netlify.

Links:
TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by our friends at Fairwinds. Whether you’re new to Kubernetes or have some experience under your belt, and then definitely don’t want to deal with Kubernetes, there are some things you should simply never, ever do in Kubernetes. I would say, “run it at all.” They would argue with me, and that’s okay because we’re going to argue about that. Kendall Miller, president of Fairwinds, was one of the first hires at the company and has spent the last six years the dream of disrupting infrastructure a reality while keeping his finger on the pulse of changing demands in the market, and valuable partnership opportunities. He joins senior site reliability engineer Stevie Caldwell, who supports a growing platform of microservices running on Kubernetes in AWS. I’m joining them as we all discuss what Dev and Ops teams should not do in Kubernetes if they want to get the most out of the leading container orchestrator by volume and complexity. We’re going to speak anecdotally of some Kubernetes failures and how to avoid them, and they’re going to verbally punch me in the face. Sign up now at fairwinds.com/never. That’s fairwinds.com/never.


Corey: The apps on cloud summit is a new action packed, not a conference, happening May 11th through 13th online. Its for everyone who makes applications in the cloud run screaming. From IT leaders to DevOps pros to you folks, whoever you might be. Take a break from screaming into the cloudy void with me to learn from some of the best of people who actually know what they’re doing. Like Kelsey Hightower, AWS blogger John Meyer, and also me, because apparently they didn’t listen to me saying I had no idea what I was doing. Register now at turbonomic.com/screaming. Theres a “swag box” ready to ship for the first two thousand registrants, so you don’t want to miss this. Thanks for Turbonomic for sponsoring this ridiculous podcast.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Laurie Voss, who is currently a senior data analyst at a company called Netlify. Laurie, thank you for joining me.


Laurie: Thanks for inviting me.


Corey: So, let’s start at the very beginning. What is Netlify?


Laurie: Netlify is a single cohesive build chain for websites. A lot of people don’t think of it that way. I think a lot of people think of Netlify as a web host, but really where people are getting value from Netlify is you build your website, you upload your website, you deploy your website, you host your website, you test your website, you monitor your website, and that can be five or six different services, like a CI service, and a hosting service, and a Git service and all of those things. And Netlify just joins that entire build chain into a single tool where you just hook up a Git repo, hit commit, and it goes out into the world, and it’s incredibly fast and convenient. And that’s really where people get value out of it.

Corey: Perhaps somewhat uncharitably, I would almost think of that as Heroku for this decade.


Laurie: I mean, I would consider that pretty charitable to us and somewhat uncharitable to Heroku, who are still around and chugging.


Corey: Oh, absolutely. I’m a big fan of things like that, where it’s take this code—whatever it looks like, maybe it’s a repository, maybe it’s some, I don’t know, some files I email over, God forbid—and then go ahead and deploy it into something that at least pretends to be able to scale. I often hear Netlify brought up in the context of Jamstack, which seems to be this whole area of cloud computing that I don’t tend to spend a whole lot of time in, at least not knowingly. What is it?


Laurie: So, Jamstack originally stood for JavaScript, APIs, and markup sometimes also referred to—


Corey: But I hate all of those things. Please continue.


Laurie: [laugh] it’s sometimes also referred to as static websites, which is a term I tend to avoid simply because it’s not really very accurate. A static website is one of the things that you can deploy on the Jamstack, certainly, but it’s certainly not the only thing you can deploy. I would say that it is an architecture that lends itself to pre-rendering as much content as is possible, and then caching all of that stuff at the edge, and then pulling in only the bare minimum of dynamic content to improve both scalability and performance. Those are the things that people like about Jamstack websites, is that they tend to be extremely fast.


Corey: So, that makes intuitive sense to me. And you, of course, became fairly broadly known as one of the people behind npm. But now you’re a senior data analyst, which feels like it’s a departure from the things you were doing to the things you’re doing now. Help me either validate that, or tell me what obvious thing I’m missing, or highlight something clever for me because right now, I feel like there’s a missing link in my chain of events here.

Laurie: No, that’s a totally fair question. So, I started npm as the CTO and hired an excellent engineering team underneath me. In fact, one of our very first hires was a lady called C J Silverio, who is just a staggeringly good engineer. And it became very obvious very early on in the life of the company that we really had two people of CTO caliber, and that we didn’t need to have them, but what we did need was somebody to run the operational side of the business. So, relatively early on in the life of the company, we promoted C J to CTO, and I moved my title to COO, you know, obviously, still with a technical bent, but my job as a COO is to do operational things.


So, I was in charge of running the financials and making sure that marketing and sales weren’t going massively over budget or under quota, those sorts of things. And that’s fundamentally a keep-the-lights-on data analysis job. So, while I was CTO, I was sharing fun stats about npm’s internals; while I was COO, I was doing a lot of analysis of our financials. But the common factor was analysis, and I was doing more and more of it. So, towards the end of my time at npm, I became the Chief Data Officer, where I basically specialized down into doing just data things—some financial, some technical—and doing a lot of outward-facing presentations about that kind of thing.


So, that was where my job ended up being. And literally how I pitched my way into Netlify was like, “What if I did that thing that I was doing for npm for you,” and they were like, “Great. You can’t be a C though because you just got here.” [laugh]. I was like, “Fine.”


Corey: Well, of course. We all have to start somewhere. Humility. And it took me a couple of years to unofficially run AWS marketing. My God. Yeah, have some humility as you step through this process. Was it a big barrier to you once you arrived at Netlify, convincing them to buy you the Excel license you obviously need to do all this data analysis, or alternately, are there better tools for it, then the one that we’ve all been using anyway?

Laurie: Honestly, I’ve always been a Google Sheets partisan. I know that the really hardcore financial types will complain about the functions that are missing from Google Sheets versus Excel—


Corey: Oh, will they ever.


Laurie: —but I’m not that person. But we have a pretty great stack that I like quite a lot at Netlify these days. We have a variety of older tools laying around, not all of which we’ve migrated away from, but the core of the new class is this company called Databricks, who are basically Spark clusters as a service. So, you can just throw, essentially, arbitrarily large amounts of log data on to S3 buckets on AWS, and it can query them as if they were databases, which is truly beautiful. And on top of them, we have a system called Mode Analytics, which is a general platform for data analysis, and presentation; draws graphs, that kind of thing; has an SQL interface.


And between those two we’ve got a new open-source project, or relatively new to me anyway, called dbt, which is this very organized, clever way of codifying your best practices around data. So, you’ve probably heard of extract, transform, load jobs; it’s basically a way of quantifying chains of extract, transform, and load jobs such that they’re always tested, and always running, and you know what the dependencies are between them and everything is documented.


Corey: Okay. While I’m in the process of getting everyone in trouble on things, what is your take on machine learning for things like this? Because it seems that whenever you talk about data, it’s inevitable that someone, usually with a crap ton of VC backing, will immediately jump in because they’re clearly getting bonused every time they managed to fit the phrase ‘machine learning’ into basically anything.


Laurie: So, I would step back a bit and say that, before I joined Netlify, I interviewed at a couple other companies just to see what the space was like, for basically the same job at other companies. And there was a really interesting pattern that I noticed, which is that it is quite a common pattern for an early-stage startup, to say, “Oh, we have a data problem. We must hire a data scientist.” And they go and find somebody staggeringly qualified, with a PhD in data science, and they hire that person. And that person immediately runs into trouble because that is not actually the problem that they have.


They don’t have a data science problem; they have a data engineering problem. They have, like, mounds of data lying everywhere, and it’s not organized, nobody knows where it is, nobody can query it efficiently. A data scientist is, at earliest, your fifth hire in your data team. The first five people are people who have to do an enormous amount of plumbing and engineering to be able to just get the data from all of the places that it’s lying around, all of the piles that it’s accumulating in, into any kind of a reasonable format that you can query it and figure out what it does.


Corey: You have to forgive my cynicism on some level because I’ve been in the ops space for, I guess, entirely too long where I’ve been dealing—particularly in the context of AWS bills, with making arguments against data science teams who are insisting that the Apache logs from 2012 that are taking petabytes of space are the key to unlocking the mysteries of the business. They’re not sure how yet, but one day they’re going to become super valuable, so I’m never allowed to delete anything. And on some level, it just almost seems like it’s a big make-work conspiracy for data scientists amongst each other which, hey, respect. Counter-argument; what sorts of insights can you glean from these vast quantities of data because everyone else I’ve talked to about this generally works for a big-data-oriented company. I got to be honest with you, it feels like they’re selling pickaxes into a gold rush because, “Oh, it’s very important to keep all your data so that we can sell you things to go through it.” You’re on the other side of that your buy-side. So, what is the value that this giant data hoard winds up providing?

Laurie: Well, I will say that my initial inclination is to agree with you. There’s definitely a lot of pickaxes being sold to miners who have no idea what they’re doing. I think about ten years ago, there was a huge industry-wide pile-into big data people were like, you need Hadoop, and you need gigantic data processing clusters, and huge data, and massive amounts of processing, and, like, buy this enterprise contract for $100,000 a year. And then everybody did those things and was like, “And now what?” And they were like, “Oh, well, we don’t know. Maybe you can count it up. How many hits did you get?”


That’s not useful analysis. Having all of your data queryable is not, per se, a useful thing to be able to do. And I think in the 10 years since then, people have got smarter about that. They realized medium and small data are actually [laugh] often quite useful. It’s more about how you analyze it, and can you present it to people, and can you make sense of it?


But there was a second gold rush into the ML space. There are certainly use cases where you have enough data and a problem that is amenable to being solved by applying ML to it in some way. Those are a minority of cases; they’re maybe five percent of all data problems are big enough that you can use ML in the first place, and also get an answer that ML can help you with, would be helpful. And the other ninety-five percent, it’s just plumbing and engineering.


Corey: Once upon a time, it felt like the way to address all this data was the… honestly, the result of a prank perpetuated many moons ago by what felt like Google in a white paper, that Yahoo went for hook, line, and sinker for MapReduce, which then led to Hadoop and a bunch of other stuff. I maintain this was a Google April Fool’s prank that everyone took way too seriously and went way too far. These days, it feels like stream processing as that data comes in is sort of the preferred approach. Yes, no, or am I completely misunderstanding most of the point? Or all the above?


Laurie: I would say definitely, the industry has moved away from the batch processing that Hadoop did. I actually worked at Yahoo at the time when they were inventing Hadoop. [laugh].


Corey: Oh, you fell for it, too. Great.


Laurie: [laugh]. I was—we were selling the Kool Aid as opposed to drinking it.


Corey: Oh, if you’re going to be involved in a Kool Aid transaction, that is absolutely the side of it you want to be on. Let’s be very clear here.


Laurie: So yeah, streaming processing, but like semi-real-time processing of things, as opposed to giant batch jobs is certainly where stuff has mostly gone. Although people who are end consumers of data, as an analyst, if I asked you how fresh does this data need to be, they will always say realtime. Like, [laugh] that will be their first answer. And then I’ll be like, “What if it was 24 hours delayed?” And they’re like, “Oh, yeah. Well, obviously, yesterday’s data is fine. I’m not going to care about what happened at noon today when it’s 2 p.m.” And then you’re like, “Well, yes. Well, then it’s a batch job, and it’s, like, an order of magnitude cheaper to provide to you, so let’s do that.” Batch jobs are still very cost efficient and so we do a lot of batch processing, it’s just we don’t make a big song and dance about it anymore because it’s no longer the new shiny thing.


Corey: On some level, it feels like that is the nature of things where something gets announced, and it’s super complicated and hard, and people skill to the peaks of complexity, and they make good money doing it. I mean, in the original dotcom boom, ‘firewall engineer’ was a quarter million dollars a year if you could swing it. Now, it’s just assumed that basically, anyone who touches the network should be able to configure firewall rules; things get simpler with time. It feels, on some level, like an awful lot of the data world is undergoing some of that consolidation as well, where we’re starting to find tools and methods and ways to extract meaning from giant piles of data without the part where, you know, you go and drop $5 million here on a data science team.


Laurie: Well, you’ve sort of arrived at my favorite pet topic, which is the stack. The stack is this abstraction that I wrote about at the beginning of last year. It’s the idea that the ever increasing complexity of technical fields means that we are constantly inventing, adopting, and then forgetting about abstractions. As you said, we’re constantly chasing after the new shiny thing; we make a big song and dance about it; it’s very complicated. People make enormous amounts of money doing it in the early days, and then somebody eventually invents some kind of tool or open-source framework, or possibly, like, a SaaS that makes it one-click to do.


And it’s not any less complicated or any less magical than it was before, it’s just you think about it much less, right? Like I mentioned, Databricks. Every time I run a query Databricks is taking my SQL, converting my SQL into giant MapReduces, running it on a huge cluster of machines of arbitrary size alu—I don’t know what size it is because I don’t need to care anymore—and then pointing it at AWS, where it’s pulling in every single piece of data in every bucket that I put in there. And all of that, ten years ago would have been of a complexity that only Google or Yahoo could do it. And now it’s literally we spin them up by clicking a button and we don’t even remember that it’s happening.

Like, all of that complexity is still happening, all of that magic is still happening, but now it’s just a commodity. And we’re doing that across the tech space. So, we’ve certainly done it in data; a bunch of stuff that used to be very complicated, used to be the thing that you would hire me to do is now just the tool that I use and the thing that I do is the analysis, which is a more useful use of someone’s time, really.

Corey: One of like to hope so. But I do feel like there’s a story—and we see it across the board; this is one of the things I really enjoy about Netlify—once upon a time to put a website on the internet, you had to know a whole bunch of different things all at the same time. It was, how to build a web server, how to maintain and patch that web server so it didn’t become an attack spam cannon, how to get files into a format the web server could understand, how to put that out there, how to get DNS to work, how to handle SSL—if that was even a glimmer in your eye at that point—and so on and so forth. Now, it really requires, click a button. And Netlify is made this way easier because I tend to look at this from the exact opposite side in the industry where I come from an ops background; building all the infrastructure to handle these things is relatively straightforward to me, but then I get to the other side.


Cool, now all that’s done, “Build the web app.” And my response, “Ehhh, what?” Yeah, I can write bad HTML by hand, sort of, and that’s as far as I generally tend to go, whereas it feels like the Jamstack story in general, and Netlify in particular, are aimed at folks in many ways, coming from the other side of the world where it’s, “I picked up JavaScript. I picked up a framework or two. I understand frontend, I understand how web applications get built. What’s the deal with this whole infrastructure piece?” And thanks to the miracle of stacks collapsing in upon themselves in many respects, you don’t have to know about that or care, and you live in this blissful world where the term Kubernetes never crosses your desk. Is that a fair summation of the state of the industry? Am I dramatically misunderstanding what Netlify does and for whom?

Laurie: No, I think that’s pretty much how it goes. One of the reasons that I wrote this blog post about the stack—it was almost exactly a year ago—is because about a year ago is when I joined Netlify and I was suddenly immersed in the things that Netlify does. It became more clear to me that I was seeing a fundamental shift happening.


I was like, “Oh. We are obeying some kind of natural law here, right? We are taking things that used to be people’s whole jobs and turning them into things that are so simple that you don’t even think about them happening anymore.” I’ve definitely met and worked with people in my life whose whole job was managing SSL certificates. And now, it’s literally a checkbox. And it’s on by default. It’s like, “Would you like your site to be secured by SSL?” Yes, obviously. I don’t know why I would turn that off.


And it just comes as part of deploying your website. Way in the background, let’s encrypt is doing it, and there’s a whole bunch of song and dance about refreshing certs every 90 days, and it all just happens completely automatically without you caring even a little bit. And that’s what Netlify is doing. It’s taking things that used to be five or six companies and squishing them down into a single layer that you call your deploy service. And you’re like, “Great. My deploy service does all of those things and I don’t need those other five companies anymore.”

Corey: Now, if you’re one of those five companies, that becomes something of a problem. But again, that’s the pace of innovation. That is the world continuing to evolve.


Laurie: Nobody wants to be commoditized, but on the other hand, the company that gets to do the commoditizing tends to run away with it, right? Like that’s kind of the AWS story. It’s like, there used to be lots and lots of companies that would sell you a server in a rack and then take 24 hours to set it up and you’d pay with a credit card. And AWS was like, “What if that was one button?” And everyone was like, “Yes, I would love that to be one button. I never want to care about what rack it’s in anymore, or whether or not it has enough power, or whether or not the cable in the back has got jiggly. Just virtualize it all the way for me, thank you.” And then AWS completely ran away with it.

Corey: Oh, yes. And it’s AWS, so it was, “What if that button was hidden in a console that doesn’t work super well, and then we give that button a terrible name?” People are like, “Ehh, I’ll risk it.”


Laurie: I mean, the observed behavior of the industry is that we love the terrible console.

Corey: Oh, absolutely. Everyone talks about infrastructure as code, which is basically a polite way of saying I use the console, and then lie about it on conference talks.


Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.

Laurie: [laugh]. Indeed.


Corey: So, since you brought up AWS, terrific, it’s time for me to do my whole conspiracy theory approach here and accuse you of basically war crimes. So, you were big into the npm space for a long time, which is great. I accept the fact that that is a thing that happens—package.json and package-lock.json are basically artifacts of you folks.


Now, AWS has launched their Amazon CodeGuru machine learning—wink, wink, nudge, nudge—powered code review. And of course because it’s AWS, they charge based upon lines of code in a pull request, which tells me that you’re a deep plant for many years now, planning for the day where this one day supports JavaScript—which it doesn’t today—and all someone has to do is check in the package-lock and the package.json files once, and suddenly the entire scheme pays off handsomely. True, false, or I’m not supposed to talk about that in public?

Laurie: It’s true. I’m part of a global cabal whose purpose is to make Node modules infinitely deep until the gravity well sucks in all of programming and we don’t have computers anymore.


Corey: On a slightly more serious note, I do want to talk a little bit about package management—in the context of programming languages as opposed to package management in the context of Linux distributions because, oh, do I have thoughts on that—there are a few different competing tools out there to handle dependencies across different programming languages, in the JavaScript world, in the Python world. And I’m not a JavaScript programmer, except when forced to be, and it’s usually editing something as small-scale as humanly possible and backing away slowly. But my general consensus, looking at it across the board, is that there is no consensus, that there is no clear one right way to do things. Invariably, dependencies always become a challenge. Getting something to a reproducible build while also being secure is a problem.

And no matter what stack you pick, what language you pick, there’s always a—for ‘Hello World’—there’s a step one of setting up your local environment to resemble what the person writing the document’s environment looks like. Is that accurate? Is there some magic tool out there that somehow I’m just unaware of that solves all of this for me?


Laurie: Well, there’s definitely not a single tool that gets it completely right, but I would say that there is a commonality between the things that work that I don’t know that everyone appreciates. So, I’m going to draw a parallel between package.json and Kubernetes right now, so bear with me. Basically the thing that people often don’t like about npm and the thing that people don’t like about package.json is that it says, “All of your dependencies must live here, in your tree. I don’t care how many JavaScript projects are on your computer; I am going to have one copy of every module right here where I can see it, and I’m going to use those and only those.”


It tends to make JavaScript programs a little bit easier to debug because you know that the code that is at fault can’t possibly be anywhere else. It can’t be sitting in userlib unexpectedly, or in some additional libraries folder, or it can’t have been, like, blown away by somebody installing something else. It has to be the one that’s sitting in your tree, and that’s one of the things that made Node so popular in the beginning, and npm so popular at the same time, was that it was very easy to deal with, and in particular, it made it work on Windows, which didn’t have any of those things anyway. And Node's popularity as a development environment, where you could write code on Windows and it would work perfectly in a Linux environment because all of the dependencies were JavaScript and that ran the same on both of those computers is understated. And that’s essentially the Kubernetes story.


Kubernetes is saying, “This thing where we have libraries all over the place, where we have dependencies all over the place, like, they lie all over the operating system. It’s too late to fix that. What if we packaged up the entire operating system and said that that’s the package?” And that’s what Kubernetes is. It’s creating a package.json of your entire computer, and then you run that.


Corey: It sure beats the old approach of, “Oh, it works on your machine. Great. Well, backup your email, Slappy, because your laptop going to production.”


Laurie: Exactly. Right. It’s basically, you’ve packaged up the entire world. And people are like, “Well, this is very wasteful.” And we’re like, “Yes, it’s very wasteful. But it works.” And like the other approach—


Corey: You know what’s less wasteful, then? That’s right, a whole bunch of engineering time spent fixing things. “Well, that’s not the most optimal way of doing it,” say people who seem to consistently mistake their time for being free.


Laurie: Exactly.


Corey: No, and it makes perfect sense. I love the fact that I can use at least some semblance of what other people are using and get it to work. The counter-argument to it is that it’s very—how do I put this—disconcerting when I’m working in a Python project, but I’m using a framework or so that generally installs via npm, and now my Python project has a package.json in there, and I get very confused at first. And, all right, then I run npm install in there and then I’m way more confused. And I mostly just look at this, and I struggled to make sense of it before the penny drops. “Oh, that’s right. It’s because I’m bad at computers.” I wish people would not keep letting me forget that part.

Laurie: [laugh]. Is your objection that you can’t launch a website these days without JavaScript anymore because a lot of people are angry about that, and they send me email more often than you would imagine.


Corey: Well, I assume it’s your personal fault, right?


Laurie: I mean, absolutely. Like, again, the secret cabal; we’re trying to inflate all of your applications with as much extraneous code, with as many security vulnerabilities as we can possibly manage because I work for the people who sell storage and virus scanning, obviously.

Corey: Emailing you about the world requiring JavaScript is evocative of an old story where some town manager angrily emailed the CentOS project maintainers because someone installed a web server in his environment and he pulled it up, and this isn’t our town’s website; it’s the default, “Welcome to CentOS. If you’re seeing this page, you’ve successfully installed Apache. Read these docs to configure it…” and accused them of hacking his website. It seems roughly the same level of technical nuance, blaming you for the proliferation of something in society.


Laurie: I don’t know. I mean, I certainly spent five years cheerleading it, so I feel like people who are, like, “You helped make this popular.” I’m like, “Oh, why thank you. I’m so glad you think I made a difference.” But really, it probably would have happened on its own. Like, I was running after a snowball that was already running very quickly downhill and engulfing villages as it went.

Corey: Absolutely. And I do want to talk to you about that in particular because as people on this podcast often hear, I talk about this podcast, I talk about the AWS Morning Brief my other podcast, and I talk about lastweekinaws.com where my newsletter lives; I don’t urge people to follow me on Twitter, I don’t talk about the Facebook page I don’t have. And the reason behind all of those things, is that I have built an audience on open standards and open platforms so that no one company can change business models and suddenly I have a serious problem.

It’s why I blog on my own website, not on Medium. Their business model changes aren’t going to directly impact what I do and how I do it. Do you think this is naive? Do you think that the open web was a nice idea and now we’re just going to see increasingly walled gardens as time goes on?


Laurie: I think the openness of your website is—or your web app, or your, sort of, technical strategy in general—is always going to be a hybrid; like AWS is… it’s not rolling your own, you’re using a service. If AWS decides that they don’t support your service anymore—which they never do as far as I can tell, but theoretically, they could—you would have to stop doing that; you are to some extent locked into AWS. But I don’t think that a website hosted on AWS is, like, not part of the open web.


Corey: I would agree wholeheartedly on that point, absolutely.

Laurie: Right. I think at that point, you’ve adopted a tool that works for you, and you can move elsewhere. So, there are people who say using JavaScript frameworks, that’s not the open web, you should have been writing your own; you’re dependent on Facebook continuing to maintain React. And I’m like, “Well, kind of, but not really. You don’t have to be. You could write your own website, if you wanted to. This way, it’s just faster, in the same way that hosting it on AWS is faster than spinning up your own machines.”


Corey: Oh, I take it a step further beyond that, I paid WP engine which, they manage WordPress for me, so I don’t have to, and the reason for that is I’ve managed WordPress in the past, and I will not go down that path again for love or money.


Laurie: [laugh]. Right.


Corey: But then, as a fun artifact of that, lastweekinaws.com does in fact live on GCP.


Laurie: [laugh]. Nice.


Corey: But it’s WordPress. Worst case, WP Engine shuts down, or charges me at times more, or decides that now, nope, everything has to move to a new framework, I can migrate it elsewhere. And the fact that I have that strategic exodus means that I don’t need to sit here on everything I build and agonize over, do I go all-in on my current hosting provider or not? It’s something that I can migrate with me. And I try and maintain at least that theoretical exodus path.


I can repoint domains to other places; I own the domains myself, and that has been enough for the way that I view the world. But increasingly, I’m starting to feel like a relic. Oh, follow me on Instagram; follow me on TikTok and it’s if these platforms pull a MySpace and vanish, then you’ve got to rebuild your audience from scratch, whereas email’s been with us longer than I’ve been alive, and it’ll be here long after I’m dead. I can carry that audience with me regardless of what any particular provider has. I just wish I didn’t feel like such a Captain Edgecase, or someone stuck in the past whenever I articulate that to some folks.

Laurie: Well, I’ve been in the industry a long time, so I think if you’re going to, sort of, say, “I’ve got this old opinion,” I’m going to be like, “Me, too. I’m also extremely old.”


Corey: And then we’ll talk about the Great War. “Wasn’t it amazing?” And, yeah, there we are.


Laurie: The browser wars of ’97, and I was ‘16.


Corey: Yes, we’ll make Eternal September references all week.


Laurie: Oh, my God. See, we’re literally doing that thing that I was just joking we were going to do.


Corey: We absolutely are.


Laurie: Yeah, I think you have to pick your battles. I think the one that I personally struggle most with is databases. I spent a good chunk of my career as a DBA; I definitely know how to install and configure databases. I don’t want to. [laugh]. You know, like, using one of the fancy databases as services, where you’re just like, it has an SQL interface and it’s got apparently infinite storage and infinite processor, and I don’t need to worry about it anymore.

Corey: Exactly, and it has those things because what it also has is someone else’s credit card. Done.


Laurie: Right. It’s great. But to some extent, I’m definitely locking myself into that database service, right? To some extent, I have to find an equally capable service if I ever wanted to migrate away. So, am I still open, or am I locked in then?


I don’t think anybody can call themselves truly independent, anybody can call themselves truly open. So, from your perspective of, like, what platform am I on, as long as you’re not only on that platform, as long as it’s not your only bet, I think—sure, pile into the Facebook page. Why not?

Corey: Yeah. I have separate problems with that that we need not get into here.


Laurie: [laugh].


Corey: That’ll be a whole separate episode there. So, as to look across the past—I don’t know, let’s call it eight decades that you and I have been in tech together, what are the themes you’ve seen continue to emerge that people should be paying attention to moving forward?

Laurie: I think one of the most common mistakes that I see in technologists who’ve been in the industry a long time, is—I can tell that they’re doing it because they start ranting about ‘the fundamentals.’ And it is my firmly held conviction—and no one will sway me from it—there is no such thing as the fundamentals. Everybody comes into the industry at a certain time, when a certain set of tools were considered commodities that you don’t need to think about, a certain set of tools were considered, like, the complicated thing that you need to learn, and a certain set of tools were considered, like, fluff on top that are bonus, but those things are always drifting downwards, right? Yesterday’s fluff is today’s bedrock, and the new fluff is stuff that wasn’t invented before. And then they start going, “Well, you should be able to understand HTTP, and roll your own JavaScript framework because those are the fundamentals.”

And I’m like, “Only to you because you came into the industry when that was the complicated thing. The fundamentals to somebody who started 20 years before you did are like, ‘you need to know about power management and how to configure a firewall,’” like you were saying, in the beginning of this thing. Everybody’s fundamentals are somebody else’s fluff.

Corey: Oh, you want to learn how Linux works? Step one—I see this in classes all the time—learn how Vim works.

Laurie: Right, exactly.


Corey: How about not doing that?


Laurie: Oh, my God—


Corey: —and focusing on the differentiated part. My God.

Laurie: The bizarre cargo-culting of Vim. I’m like, “You know why the people who are good at Vim are good at Vim? It’s because they’ve been doing Vim for 30 years. If you do any tool for 30 years, you’re going to be really good at it.”


Corey: So, you say that, but then you look at me with databases, and I don’t know, I might be able to fool you on that one.


Laurie: [laugh]. Use any tool for 30 years, and you’ll be so good at it that the switching cost is too high to go to anything else. But if you’re just starting in the industry, you could start with any editor that you wanted and it would be fine. And by the time you’ve been using it for 30 years, you’ll be like a goddamn wizard at it.


Corey: Mm-hm. Absolutely.

Laurie: So, that’s what I tell people is, like, the things that you learn now, you’re going to have to expect that they get commoditized. The stack that you live on today will get crushed down to nothing and you have to be constantly climbing the stack to what the new thing is.


Corey: [laugh]. I want to thank you for taking so much time to speak with me today. If people want to hear more about what you have to say and how you wish to say it, okay can they find you?


Laurie: I’m most active and responsive on Twitter. My username is @seldo and I also own seldo.com where I blog much less frequently than I would like to.


Corey: And we will, of course, put links to both of those into the [show notes 00:32:33]. Thank you so much for taking the time to speak with me. I really appreciate it.


Laurie: Thanks for the invitation. It’s been a lot of fun.


Corey: Really has. Laurie Voss, senior data analyst at Netlify. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and an entirely insulting, rambling comment complaining about how I talked about all these different package management systems for different languages and never once mentioned Rust.

Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.

This has been a HumblePod production. Stay humble.
Play Episode
Security Made Simple in the Data Economy with Mark Curphey
Screaming in the Cloud
04.13.2021
34 Minutes
About MarkMark Curphey is the co-founder at Open Raven, a cloud native data security company. Mark’s fingerprints can be found all over the security industry, but perhaps most visibly from his role as the founder of OWASP. His contributions range from his time as a hands-on application security director at Charles Schwab, Product Unit Manager of Microsoft’s MSDN program and his more recent role as founder and CEO of SourceClear. Mark’s obsessed with building elegant products that solve hard problems for discerning customers.

Links:
TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit https://launchdarkly.com and tell them Corey sent you, and watch for the wince.

Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit https://www.lacework.com.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. A recurring theme of a lot of my nonsense has been finding hapless companies who have not been adequate stewards of the data with which they have been entrusted and giving them the ignominious S3 Bucket Negligence Award. That seems to be something that isn’t well-appreciated in some areas, so I figured, let’s have a conversation about that in a bit more depth. Today’s episode is sponsored slash promoted by Open Raven and I’m joined by Mark Curphey, their co-founder and chief product officer. Mark, thanks for joining me.


Mark: Thanks for having me.


Corey: So, let’s start at the very beginning. As a co-founder and chief product officer, that means that you’re one of those folks who very early on presumably had part of the idea, if not the entire idea for what the company does. What is Open Raven, and where did you folks come from? What problem are you aimed at solving?


Mark: Sure. So actually, it’s an interesting story. I had previously done an application security company called SourceClear that I sold to CA. My co-founder Dave Cole was the early product guy at a company called CrowdStrike, which recently IPO'd. And David and I had always wanted to work together; really didn’t know what to go do.


And the honest truth is we decided to go be good capitalists and went out and asked our chief security officer friends, “What’s the biggest problem that you’ve got?” And resoundingly, it came back that, “I don’t know where my data is. I don’t know what type of data I have. I don’t know how it’s being protected. And data breaches are happening all the time, and it’s probably the big thing that I’m going to get fired for.” So frankly, Dave and I rubbed our hands together and said, “I think we can make money off of that.” And solve a meaningful problem. And hence, the Open Raven company as it is now.


Corey: Which is absolutely something that is increasingly in the public eye. Well, we’d like to hope. At some point, people just shrug, give up, assume that everything about them is public, and that’s the end of privacy to some extent, and get on with their lives. At least, that’s the negative story. I like to believe that on some level, getting better than we are today is possible.


And what infuriates me, and why I started giving out S3 Bucket Negligence Awards personally, isn’t because you wound up getting breached. I view, on some level, that is being akin to taking an outage: it happens to everyone on some level, and you have to prepare for it as best you can. All right, I get that. One of the problems that we tend to see from all corners is that companies that wind up getting breached are, in many cases, exposing data that isn’t theirs, that no one consented to have handled by these folks. We see it, in some cases, with some of the credit reporting agencies and some of the data brokers. And it’s not always S3 buckets, but it is the consistent drumbeat of companies not being adequate stewards of the data that has been entrusted to their care.


Mark: Yeah. I mean, look, it’s certainly true that a lot of people have breached fatigue; this stuff’s been going on for years, and years, and years. I think that the S3 Negligence Awards, or the Bucket Wall of Shame kind of go back down to DEFCON hacker conferences. It’s called the Wall of Shame from passwords. It’s not necessarily a new phenomenon.



And I would also say that whilst S3, you know, you open The Register and every day, there’s an S3 bucket thing, it’s certainly not only S3. We know that; we’ve been doing some profiling of things, and Elastic, and MongoDB, and everything else is hanging out there. But I guess buckets, sort of, tend to be so easy just to make them open and host data on them in the first place. But I think you’re right: companies that have data, whether it’s knowingly capturing it or processing it, you have a duty of care, at managing and holding someone else’s data. And it just feels like people don’t take that duty of care seriously enough.


Corey: And what’s more, is that you’ll often see a company get breached, and, “Oh, your data has been subject to a data breach.” And ideally, you wind up getting that notification before you read about it in the papers. And a lot of the companies that you do business with, that contact you are very quick to point the finger of blame at a third-party contractor. Well, I didn’t hire the third-party contractor. You did, and if you’re not willing to wind up owning up to that, well, you’re effectively trying to outsource the work—which is fair—and the blame, which is not. How do you stand on that?


Mark: Yeah. Well, it’s a system that we happen to be using, but it was someone else’s problem, that the default configuration was their problem. I mean also, Corey, I can tell you I have a lot of friends in the forensics industry who deal with incident response, and still to this day, the vast majority of data breaches and never reported; I know of breaches that have happened in major public companies where all the breach laws are such that they should have notified their customers and they should have notified the authorities, and it just doesn’t happen. So, it’s one of those problems that I think it’s like the iceberg problem, right? And to a large extent, it’s kind of an interesting one in that when someone notifies their customers, they’re doing it from transparency.


And whilst I think you and I will both appreciate that and place more trust in those companies, the reality is a lot of the public wouldn’t. And so the incentives aren’t necessarily aligned up there around why, and why they should do it.


Corey: I would take it even a step further than that. I would argue that I don’t know if it’s a majority, but a significant number of breaches are almost certainly never detected in the first place. On some level the, “Oh, we’ll detect data breaches,” as a pitch that a vendor makes to a company is going to be met on some level was, “Good Lord, no. Why would we want to do that? We are happier not knowing.” And that depresses me.


Mark: Absolutely true. I’ve been building security tools for 20 years, and you’ll be surprised the amount of people that, if I deploy your tool, even as a trial and we find that we have problems, then I’m legally responsible for going and dealing with it, and I won’t touch it. The other thing that’s kind of related to that is that the security guys are incredibly busy as well, and the security tools, historically, generate lots and lots of noise; very low signal, lots and lots of noise. And so the security teams look at it and they go, “Oh, my gosh, I’m going to get a whole bunch more noise that I have to go deal with, and a bunch of more work that I have to go do. Can I bury my head in the sand?” Like, “Sure.” And it happens. That’s just the reality of the world we’re living in, unfortunately.


Corey: It is. And for better or worse, I think that it’s a world that we’re sort of stuck in to some extent. Do you think that the drumbeat of open S3 buckets that have been misconfigured containing sensitive data, it feels like we aren’t seeing as many of those as we once did, but is that just because people aren’t reporting them? Is it something that is going away slowly but surely? Or is it just as bad as it’s ever been, but it’s not making headlines anymore?


Mark: So, I’m actually building a tool to profile the AWS IP space for all the open buckets, and all of the open Elasticsearch and MongoDB thing. There’s a few of those that are out there, like Greyhat Warfare, which you can go search for S3 only, but not all the other stuff. I think when I looked up there the other day, I want to say there were about 750,000 open buckets. Now, of course, not all of those open buckets are—a number of them should be open. That’s kind of why they’re there and et cetera. So it’s—


Corey: Oh, I keep getting alerts constantly about open buckets that I have that are intentionally open, and I get alerts in the console, and I get emails at least quarterly, and this bucket that it starts with the word assets dot and then some domain, yeah. How about that? That is, in fact, designed to be open. On some level, I wind up—a few of them—just slapping a CloudFront distribution in front of it, not because I need it. Just because I want it to stop nagging me.


Mark: Of course. That’s the signal-to-noise problem again. And honestly, that’s part of the reason why Open Raven’s doing very well is that all these companies have been hiring people to go around, chase down open buckets, which are designed by functionality to be open in their organizations but don’t contain any sensitive data. So, I don’t know. Look, to answer your question around the S3 bucket thing, I think again, like breaches, people have got a bit of fatigue.


And there’s only so many articles The Register can do with, “Hey, S3 bucket open to the world,” and—what is it—27 terabits of data or 900 terabits of data. It’s not necessarily one-upmanship on the headlines anymore. But the gut tells me the faster we deploy things, right, the whole kind of DevOps movement, in many ways is moving against the security grain. And rightly so. So, to think that the problem is getting better, I don’t think is reasonable.


And then I think that Amazon, in some ways, have done good things with the security policy and all of those types of things, but they’re largely designed for greenfield environments. And when you step off the reservation, I mean, gosh, what, do you think the average developer is going to be out of figuring out how to configure that XML policy in an average XML bucket? Of course not. They’re just going to make the damn thing open, and they’re just going to move on and do their job. So, we’ve got to figure out how to get better tooling, easier, secure by default. And then, like you said, we’ve got to figure out ways to reduce the noise so that people can act on signal, and not get bombarded by noise, and just shut it down.


Corey: My approach to cutting through that noise on open S3 buckets, as I tweeted out a couple of times now, is to just copy a few petabytes of data into the open buckets. My operating theory is that while you’re going to ignore a politely worded email from a security researcher, you’re probably not going to ignore a bill that is 80 times larger than it’s supposed to be at the end of the month. That seems like it might be—among other things—legally fraught. What’s your approach at Open Raven to solving this particular problem?


Mark: Well, so for us, it’s all about improving the signal-to-noise. So, it’s about setting up a policy that you can say, “On this bucket, this is the type of data that I’m expecting; these are the security controls I’m expecting; if it deviates from that in any way, go, let me know.” And then we use OPA, Open Policy Agent, to go check that and go send alerts or pump stuff out to a firehose, pump it to a security event system, or whatever. So, in general, that’s it. It’s like, define what goods meant to be, and then let me know when good is not occurring so I can go figure out how to deal with it.


And of course, you know, you create generic policies like, “Look, I never want to see financial data on a bucket that’s open to the Internet and that’s unencrypted,” or, “I never want to see something with a CIDR range of 0000 and through some security group,” or something. So, that way, essentially, what companies get to do is, sort of, encode their intended use policy and alert when that’s not there. So, for us, it’s all about that. Part of what we see, and I’ve seen this in the security industry for the last 15 or 20 years, it’s there’s textbook security and then there’s the real-world security. You go look at a lot of, kind of, textbook security solutions, and they’re fine.


They work absolutely fine. I worked at Microsoft for a long time, and I was always amazed at how everything worked perfectly at Microsoft, and then when you stepped off the reservation, nothing worked properly. But everyone in Microsoft would be scratching their heads going, “Well, it all works fine here.” It’s like a developer saying, “Hey, it works on my laptop. It was only when I committed it to CI the problem occurred.” So, it’s that same thing, and you got to design stuff for the real world.


Corey: You also say it goes beyond just S3 buckets, which I believe. For a while there, I think—was it Elasticsearch, or was it Mongo that had a default password of ‘changeme’ or something horrifying like that?



Mark: Yeah. One of those. I forget which one? It was? Elastic’s a big offender, for sure. Mongo is a big offender, for sure. But I mean, you also see, like, Jenkins servers that are sat out there. I mean, that camera thing—what was it—the Verkada thing recently? That was a CI server that happened to have a script that had access to loads of things. But the amount of Jenkins servers that are accessible through the internet is shocking. It’s not just buckets for sure.


Corey: It definitely becomes a weird thing. I don’t know if there’s a fix here—I really don’t—longer term. But instead of looking forward for a minute, let’s go back and visit the past for a bit. You were the founder of the OWASP reporting list. What is OWASP? Is it a list? I’m most familiar with the OWASP Ten. But I’m certain you’ll have a better story on that than I will.


Mark: Yeah, no, no, no. Top Ten was this whole thing. So, I was running software security at Charles Schwab, early 2000s, 2001. Before, it was kind of a really big thing. And we used to get vendors coming in trying to sell me products.


And honestly, it was kind of a joke. My market open would have 8 million accounts, like, a trillion dollars under asset, and people would come in and try and sell me a web application firewall, which, maximum throughput was like 0.01% of my market open traffic, and things. But there was nothing out there on the internet to go point to and to say, “Well, this is good.” It was basically me versus a vendor coming in.


And so I said, “Right. This is kind of crap, right?” And I got together with a bunch of other people that were also doing similar things, some other people at some other banks, some other people in other companies. And I said, “Right. I’m going to go publish something.” And I wrote it over a weekend, literally wrote this guide, called the “OWASP Guide.” And it was basically a set of principles around software security, like lease privilege—you know, nothing sophisticated, but it was those types of things—and published it.


And then OWASP was basically born. So, it’s the Open Web Application Security Project. Then it got a lot of traction because a lot of people had signed up, and a lot of people were then starting referencing this to build their own application security programs. Over time, of course, OWASP got very successful. I think it’s, like, 40,000 people or something like that turn up at those conferences all around the world and chapters all around the world.


And there’s lots and lots of projects that have taken place, one of which is the Top Ten that you referenced, that a lot of people know of. And the Top Ten has been around I want to say since, like, 2004, or something like that. I don’t know, I’d have to go back and check with history. It’s hardly changed since 2004. And you can have a good conversation around why that is. But yes, that’s the history of OWASP.


Corey: And now, of course, you have this list that doesn’t seem to have changed significantly in a while. I mean, back when I was starting up the Meanwhile in Security podcast and newsletter with Jesse Trucks, we talked about that being one of the key problems is everyone wants to know how to handle security in cloud, but if we take a look at how a lot of application vulnerabilities exist, that list hasn’t materially changed. If anything, the advent of cloud has fixed some security issues, in that you’re not allowed to muck with them anymore. Datacenter physical security is no longer a vector for most folks who are all-in on a public cloud provider. But you’re also dealing with this other problem of, where, now it’s a list of enumerated S3 buckets, for example, and if you misconfigure that, it’s something that’s globally known, and I guess it removes the security-through-obscurity argument, insofar as ever was one. Has things changed in a time of cloud or is it just the same thing with new labels on it?


Mark: Well, I mean, there’s a couple of things. So, you’ve got to ask yourself, what is the OWASP Top Ten of, right? Is it the top ten most popular issues? The top ten most severe issues? The top ten voted by security people?


Like, no one's ever really been able to get to that, apart from an arbitrary top ten. And I don’t want to take anything away from it because the Top Ten has been incredibly useful in getting to developers, giving them a tangible, like, ten things; go focus on these ten things and you’ll raise the bar. So, that’s kind of piece number one, but has it changed? Well, should you have expected it to change depends on what you believe it’s based on. If you go look at them, though, like, no.


Things like injection, and broken authentication, and sensitive data exposure, those things haven’t changed because they’re just general things and they’re going to be around forever. You think sensitive data exposure is going to go? Doesn’t matter what technology we change, it’s always going to be there. For me, though, what’s kind of interesting about it, and why maybe I’m a bit of a skeptic about it is that you can eradicate total classes of problems—I believe—by changing patterns. So, a good example is, look, if you go use one of these modern development frameworks, application frameworks.


It’s built-in inherently. And the same with a lot of SQL injection problems that you used to see all over the place. You’d have to intentionally go create those problems, for the most part, now. And I think the cloud is done the same. It’s taken a lot of problems away, it’s extrapolated them into a service, it’s extrapolated them into a pattern, and the pattern can then go away.



So, back to the S3 thing, I think there’s hope [laugh] because if you can make a change upstream, I mean, you’ve probably seen recently all these damn, you know, supply chain attacks. And the bad guys are going further and further upstream where they can affect things downstream. And the good news about all of that is if you can figure out upstream, the way to go secure it, everything downstream gets secured as well. So, I think with a lot of these things, if we can, instead of trying to play whack-a-mole or put the finger in the dike, if we can start thinking about patterns and ways to go solve them at a class or problem level, then we stand a chance of fixing them.


Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.


Corey: I sure hope you’re right. I mean, in an ideal world, you will be. But it’s, ugh, I have so much trepidation [laugh] around all this. And I don’t know how it’s going to wind up playing out. And I hope that it’s going to go well. But it just feels like you’re constantly railing against the tide. And I don’t know how to wind up addressing that. I really don’t. I wish I did.


Mark: Mm-hm.


Corey: Is there anything you can say that helps them be more optimistic about this, at least?


Mark: [laugh]. Well, I mean, you’re right. Look, I’m no longer in the application security business after spending 15 or 20 years in there because I just gave up trying to convince developers to care about security. I just—and I don’t blame the developers. They’ve got another job to go do and security’s too hard.


So, for me, it was just pushing molasses uphill. And, I think, to your point, yeah, why would you expect anything different if we carry on doing the same thing? And the reality is, we’re moving faster and faster, we’re making it easier and easier to deploy things, we’re getting more and more complex systems. Why would you expect anything different? So, yeah, I don’t think you’re skeptical for a bad reason.


Corey: No. For better or worse, we still wind up having these problems. I don’t know how to solve it. I really don’t.


Mark: I mean, look, for the reality, if you go back to the old days, like, the old school—obviously I’m a bit of an old person, right—but you go back to some of the military things used to be, like, “Trust, but verify.” That motto works incredibly well. You trust people are going to do the right thing, you verify they’ve done the right thing. That means you don’t hinder the speed, but you go back and check and if anything happens, you come back. And it’s like, accepting things.


One of the other ones around that was, like, it’s people, process, and tools. People, process, and technology. And again, technology is never going to solve the problem of security. It’s a people problem. “You can’t patch stupidity,” and all of those phrases.


But if someone gives someone access to a local root account, or whatever the thing is, doesn’t matter how many other security controls you’ve got. I mean, I’ve seen it in cloud environments, as I’m sure you have. Someone goes and creates a security group, 0000 so they hop in the thing from home and don’t have to come in and go through all of the other control points. And it’s just the way stuff works. So, if you have that—if you take that mentality of, “People, process, and technology,” and, “Trust, but verify,” I think, use the right technologies and build the right process around it, then you can at least manage the risk. The risk is never going to be zero, but you can at least manage the risk to an acceptable level.


Corey: Let’s pivot a little bit and talk about the flip side of data security. And that comes down to privacy. There’s been a bunch of regulatory efforts around that. GDPR, for example, California has its own version of that that’s going out, and there’s also a growing school of thought that thinks, on some level, we’re post-privacy. Where do you stand with that?


Mark: Yeah. I mean, look, the privacy regulations are raging right now. You got GDPR; you’ve got CPRA, the California one; you got HIPAA, the Health Information Privacy Protection Act. And they’re all over the world. Japan has them, Australia has them.


They’re all over the place. And I think the US now is talking about having a central breach law around privacy data. The great challenge is that we’re all becoming a data economy, and companies are all becoming data companies, and so they want to gather more and more data. And the reality, I think, is that this whole stuff around cookie consent, I just think it’s just nonsense. When was the last time you said, hey, I’m not going to consent to you using my cookies?


It’s kind of like back in the old days, when you said, “Hey, I’m not going to allow JavaScript to run in my browser.” Like, all of a sudden, nothing works. And you’re like, “Oh. I’ll succumb.” But then before you know it, data it’s been over-reached, right?


You probably saw the Alexa the other day that has the radar so it can watch you sleep in your bed. Sure, of course, they’re not going to use that data for anything bad. But next time a breach happens, or some clever data science person decides to correlate something—I don’t know what it might be—in the middle of the night, it happens. So, I think what you’re starting to see is that you’re starting to see regulators and legal people who don’t really understand technology, regulating to prevent those bad things happening. And then technology trying to figure out how to go and meet those regulations, but meeting it with the absolute minimum bar versus trying to figure out what the actual intention is.


And I think you’re going to see a bigger and bigger gap. I mean, look at what happened with third-party cookies as an example. The whole third-party cookie thing we saw, what was that the CORS headers, we saw anti-cross-site scripting headers because all of those things started happening. And then what does everyone do? They just go call a tracking pixel.


And then all the marketing automation tools carry on working as possible. So, I mean, I think you’ve got a balance between technology working as intended in certain good use cases, and there are people using that for their own use cases, which either break or push over the line of privacy. I don’t know. How do you see it?


Corey: I think on some level, it’s not necessarily that people care necessarily that some company in the aggregate knows what they’re doing. There are some that do, and I’m not disputing that. But for most of us, I don’t necessarily care if Google, for example, knows what I browse on the internet. I care much more if you—personally—know what I—personally—am browsing on the internet. So, there’s a question of, once they have that data, do I really care that much about what they do with an aggregate? Not really? Do I care what they do about it on individualized basis? Kind of, yeah. And do I care if they’re making, then, that individualized data available to third parties? Absolutely.


Mark: Yeah.


Corey: It comes down to what the use of that thing is. Now, I know that I am not going to win friends with that particular argument myself. And I get it. In an ideal world, I think that advertising should be something radically different than it is. There are advertisements in this podcast, for example, and they’re catering to an audience that cares about the topics we talk about on this podcast.But I have no tracking data of who listens to this, other than raw download numbers and rough GeoIP by continent. It’s not something that is ever going to be attributed—at least from where I sit—to individual listeners, nor would I want it to be.


Mark: Yeah. But, look, here’s where I might be able to convince you otherwise of that. In China, there is a well-known place called the Beijing Genomics Institute, and the Beijing Genomics Institute do genetic engineering, and not necessarily for good. So, it’s not necessarily to find cures for things, it’s also for other nefarious purposes. And the Beijing Genomics Institute acquire DNA data from US hospitals, US healthcare systems when you get your blood checked.


Now, that data is supposedly aggregated, but once you can start pulling apart DNA strands. You can start identifying people at different levels. And I think that’s the danger. There’s been a lot of cases where de-anonymizing information is possible. And so you’re making the assumption that that data is generally de-anonymized and use for the right reasons, but there’s been case after case where that’s not the case. So, maybe you’ll change your mind on that, Corey. I don’t know.


Corey: Maybe. I also, on some level, feel like I’m fighting a losing battle against the tide.


Mark: Yeah, yeah. My wife says, “Aren’t you worried about your credit card going missing?” And I’m like, “I’m sure it’s in many, many databases at this point.” I rely on Visa, at that point.


Corey: Well, that’s also a separate problem, too. I mean, this idea of, “Oh, your identity was stolen because someone else has opened a credit card in your name or stolen your credit card.” My very honest response to that is, “Oh. So, you weren’t cautious about who you decided to lend money to and validate they were the person you thought. And you’re trying to make this my problem because why, exactly?”


Mark: Yeah. I mean, look, in those cases, and that’s why it’s the corporate’s responsibility to deal with those issues. I guess it’s the same with social security numbers, in that they’re out there in so many places on the internet, and they’re pushed around in so many different ways, aren’t they? I think we’ve got to start moving into some of these zero-trust kind of protocols, and zero-knowledge ways, and all of that type of thing and the future.


Corey: Indeed. And I think that there’s one thing that every corporate entity listening to this—or representative of same—can agree on, and that is they prefer this conversation to remain hypothetical and aimed at the abstract not at them right after they’ve had a data breach, which of course brings us back to Open Raven and how it aims at these things. You do have a—at the time of this recording, it is still upcoming—a paper coming out contrasting what you have built with I believe it’s Amazon Macie?


Mark: Mm-hm. That’s right. Yep. Yep. So, when Dave and I founded the company, we went out, like I said, and we asked everyone, what’s the biggest problem, and it was data security. And then when you broke that down, it broke down into, “Let me know where my data stores are.” So, do I have buckets? Do I have stuff in RDS? Do I have stuff on file systems, et cetera? “What type of data do I have there?” “How is that data being protected?” You know, access control, and encryption, and all that things, and who has access to it?


So, it basically broke down to those things. Those things haven’t changed at all. So, think of that piece number two—what type of data do I have—as being data classification. Amazon have a service called Macie, which works on S3. So, we’ve built that feature.


Now, lucky for us, as it turned out—you get few really good breaks in the startup world—is that Amazon Macie it turns out it’s not very good, and incredibly expensive, and very, very slow. So frankly, the way we market it is, “Cheaper, faster and better than Macie.” And we believe in transparency of that. Every vendor will say we’re way better than everything, right? So, we’ve kind of done what you would do with a clinical trial in that we have basically built a—you know, here’s the test.


Here’s exactly what we’re going to test for, kind of like, laying it out in an academic paper. Here is the data, so you can go rerun the test yourself. And here are the results. And we know that we are way, way, way more accurate than Macie. We’re deployed as Lambda functions so we can scale up and run much, much faster than Macie. And then, certainly way, way cheaper than Macie, but that wouldn’t surprise you at all in that case would it?


Corey: No. Even after their massive recent price reduction, it was still, okay. That is in fact, still incredibly expensive, across the board. I mean, my argument with the original Macie and its pricing was I had a customer at that point, eyeing it and doing some math and, yeah, okay, first month would have been $76 million to run it in their existing stuff, which was significantly more than at that point, their annual AWS bill. So, it was, “Okay, let’s go with Option B,” which is literally anything except that and you’ll save money.


Even a data breach wouldn’t have been that disastrous compared to the pricing story. And now they’ve cut it to 20% of that, but that’s still an eight-figure bill to run these analytics on their data set. And that is… that’s not tenable. And on some level, it becomes the differentiated value of doing that isn’t there for customers. If I wound up running all of the various security services that AWS offers on an environment, it’s pretty clear that it would cost more than the data breach would.


Mark: Well, it doesn’t even work. Even if the cost thing was put aside, one of our customers tried it. I think they spent a million and a half on a trial, in a month, and it found 30 first names in a credit card database. I mean, it’s kind of crazy. And when you pick it apart underneath the hood, it’s a giant regex, essentially, and just doesn’t really work.


I mean, the reality is that that thing was built—it was actually a—it was originally an In-Q-Tel project, which is the funding arm of the US intelligence agencies. It was called [Harvest IO 00:28:04]. It was an acquisition that they bought in. And it was built a long, long time ago. If you want to do data classification today, you have to be able to not only identify structured, unstructured, and semi-structured data, and it comes in all places, and it goes into all file formats, in S3 buckets, it’s Parquet files—which are the backend of LakeFormation and Lakehouses and things like that.


But when you find a piece of data, you’ve got to be able to go and validate, is that data real? I mean, take an AWS API key as an example. It’s very easy to go figure out how to push that thing into that format, but is it a real key? Whereas if you use validators, go login to an AWS API and you’ll get a return that will say, “Is this a valid key, and which account is it associated with?” And so we’ve done, both in terms of the accuracy of identifying the information stores, the tests that we’ve got show, in general, we are twice or three times more accurate than Macie on finding the initial piece of data.


But then we have these validators. So, you know, you get a credit card, go call a credit card API. Is it a real credit card or is it just a 16 digit int? And you can go check that stuff. Data classification has moved on since that stuff was there.


So, even if the pricing thing was fixed—and as you point out, it certainly isn’t—it’s just not a good option for people. And then the kind of second piece to that is that the majority of customers that we see, and people, are looking at things like Snowflake. I mean, if you look at these data platforms, Databricks, Cloudera, Snowflake in particular, you know, they’re built on top of AWS services. But people are moving data to those places, so it’s not just an S3 problem, as I said. It’s about people putting data in Elasticsearch, in RDS, in file systems.The data is everywhere—and backups. Like, all of this stuff gets pushed up into backups and stuff as well. And so you’ve got to have a service which goes and checks it. We decided to go compete with S3 and beat Macie first, but that’s certainly not where the tool and technology is going.


Corey: No, for better or worse, it would seem not. Thank you so much for taking the time to speak with me. If people want to learn more about Open Raven, what you’re doing and how you’re doing it, where can they find you?


Mark: Yeah. openraven.com is the best place to go. We’ve also got a pretty exciting open-source tool coming up soon, which is called Magpie. Magpie is a cloud security posture manager.


So, think of it as we’ll go out and check all of the security settings on your AWS environment. And so we’re releasing that open-source around the end of April as well. So, keep an eye out for Magpie. We’re taking the core out of Open Raven that does all the discovery across the orgs, pulls back all the attributes, or the IAM, or the security groups, and then allows you to go write security rules on top of that. Not data rules, which is what the Open Raven platform does, but security rules. So, also go check that out, but all linked off of openraven.com.


Corey: And we’ll of course put links to that in the [show notes 00:30:43].


Mark: Wonderful.


Corey: Thank you so much for taking the time to speak with me today. I really appreciate it.


Mark: No, thank you very much, Corey. Much appreciated.


Corey: Mark Curphey, co-founder and chief product officer at Open Raven. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment enumerating all of the S3 buckets you have inadvertently left open.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


Announcer: This has been a HumblePod production. Stay humble.
Play Episode
Personalization, the Non-Creepy Way with Heidi Waterhouse
Screaming in the Cloud
04.08.2021
41 Minutes
About Heidi

Heidi is a transformation advocate with LaunchDarkly. She delights in working at the intersection of usability, risk reduction, and cutting-edge technology. One of her favorite hobbies is talking to developers about things they already knew but had never thought of that way before. She sews all her presentation shirts so they match the pajama pants.

Links:
Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.

Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. So, this promoted episode is honestly one I’ve been looking forward to for a while. Three years ago, almost exactly from the time of this recording, I started this ridiculous podcast, and we’re now a couple 100 episodes in or so. But my first guest was Heidi Waterhouse, who is now back for more. Heidi, thank you so much for joining me. You are still at LaunchDarkly; you’re a transformation advocate. First, thank you for getting this whole ridiculous thing started. The world may never forgive you.

Heidi: It’s always good to be blamed accurately.

Corey: Exactly. It’s as long as you spell my name right, there’s really no such thing as terrible publicity, now is there.

Heidi: [laugh]. I feel really sorry for the other Corey Quinn on Twitter.

Corey: Me too because he worked in marketing; he had the name longer than I did. And he gets tagged every so often in ridiculous nonsense that I’m in, and at some point, I feel like he’s going to pivot to marketing in the cloud space just because of inertia. You can’t fight the tide forever.

Heidi: Go with what’s working.

Corey: So, LaunchDarkly. There are a few things interesting about this to me. The first is that—thanks, you folks are sponsoring stuff now, like this podcast. So, thank you for that. Secondly, you’re still there, which is not a ding on the company itself, I want to be very clear, but it feels like the average shelf life of an employee in tech these days is somewhere between 12 to 18 months. You’re over twice that.

Heidi: I am, and I am employee 20 or something. It’s kind of amazing, but it turns out that you can retain high-level employees if you treat them well and allow them to keep growing. So, I think that’s the thing people might consider taking under advisement.

Corey: That feels almost like it’s one of those ancient business practices that everyone likes to frown upon, like it’s something from our grandparents’ era, like make more money than it costs you to deliver your goods and services.

Heidi: That seems fake.

Corey: Exactly. One of those old-timey business models? All right, so we talked three years ago about feature flags. For those who have not taken the time to go through every single episode we’ve ever done, what are feature flags?

Heidi: Feature flags are a way to control your code, after it’s live in the world. At its most basic level, actually, that’s what LaunchDarkly does. Feature flags don’t actually have to work live in order to work effectively. A lot of people create feature flags as database queries or environment variables or runtime changes or settings. It’s a pattern that most teams end up needing, and they either recreate it or they buy it.

Corey: A line that I heard recently that reframed the entire topic for me—and perhaps you covered this on the first episode; I don’t actually know, I was way too nervous to pay attention to what you were actually saying back then—is it separates out feature toggles and rolling out features from your code deployment. And that one resonated because there are two kinds of shops out there, really: those who have terrible disturbingly bad code deployment processes, and those who lie about having the exact same thing. No one has a terrific code deployment process to go safely and repeatedly from developer laptops into production. And the only people are going to argue with that are the people who sell something that claims to do that. But it’s always scary. It’s always frightening. And having to do that to change what your application is doing feels like it’s a little bit—how do we put this—overwrought. Is that a fair characterization?

Heidi: Yeah. And the way I think of it is in sort of a Vegas thing. So, I come from an era where we had something called a gold master. We burned things on CD, and that was what you got, so we were really cautious about what we put into a release. And now we live in an era where you can YOLO stuff into production 100 times a day and not have a problem with it.

But that doesn’t mean that people want their website that they’re using to change 100 times a day. That’s bad for the business. We’re not just moving their cheese, it’s just popping around like a video game character. But on the other hand, if we do a release every hour, it gets a lot less scary because we have to have streamlined it. Like, if you can’t run your test suite in enough time to do a code push in under an hour, then you optimize your test suite so it gets better.

So, once we separate this idea of like, “How do we get things on to the server?” From, “How do we deliver the things to people?” We realize those are actually two different roles, and why did we conflate those?

Corey: I first learned about feature flags a while back at a presentation at a meet-up from some big tech company. And it felt, “Oh, that sounds like an awesome idea that you would run at a big tech company, but here in the real world, no one’s actually going to do it until you’re at that scale.” And talking to you the first time, it was clear that that is not strictly accurate. That is the whole point of LaunchDarkly. Three years later, has the industry changed? Is adoption of this pattern becoming more widespread? Are we learning new things as we go? And if the answer to that question is no, I’ve really just painted myself into one heck of a corner, but we’re going to try it anyway.

Heidi: [laugh]. This is honestly something I’m very proud of professionally. We have done, as a company and as a movement, so much to get the idea of feature flags in front of people and teach them that it is not just about how your frontend behaves and it’s not just about A/B testing, which is what most people think it’s for, but it’s actually about being able to mitigate your risk to be more cloud-native, to think in a way that says, “I don’t actually have a deterministic way to say everybody is getting the same thing at the same time unless I’m using a tool to make sure.”

Corey: So, is the idea of feature flags strictly a frontend concept? Is it something that winds up mapping to backend as well? And, God forbid, is there a story about using feature flags for infrastructure?

Heidi: Oh, absolutely. So, I think the most compelling story that I keep running across when I talk to people in Ops, is the idea of a permanent kill switch. So, most people, when they talk about feature flags, they’re talking about deployment, and rollout, and testing. Those are what we call temporary feature flags; you’re going to pull them out after they’ve done their job. But there are also long-lived or permanent feature flags that you put on bits of your infrastructure so that you can control things when shit goes sideways.

So, for instance, imagine you have an inbound API that’s writing to your database. This is a normal thing that happens: you sanitize the data, it’s a known sender, you’re accepting all this data, and you have a monitor on it. And all of a sudden—Datadog or whatever—your monitor goes off and says, “I’m getting 100x traffic. I don’t know what’s happening. Beep, beep, beep, I’m not happy.”

And the first thing that you want is to start shunting that traffic off because it’s probably a DDoS, or some other kind of bad data. So, rather than have to wake somebody up, figure out what the problem is, figure out where to shunt it, you can set up a permanent feature flag that says, hey, if this alarm goes off, I want you to shut all that data to this overflow database. And I will wake up and look at it, but first, maybe I will ingest some coffee or at least, you know, wash my face before I try and stare at a screen. That gives people so much more time to react in a smart way and uses our automation to sort of delay the need for the human in the loop. The human has to be there, but they don’t have to be there instantly.

Corey: The traditional idea of feature flags seemed like it was something that you would use to roll out experiments, on some level, to 1 out of 100 users of your site, and then you could start validating: Is this feature working? Is it breaking? Et cetera. Facebook, I seem to recall, had something vaguely similar. This is misremembering many moons ago, back when I gave the slightest crap what anyone from Facebook had to say to me about anything, just based upon ethical reasons. But they started off with, I think, seven concentric circles, or six concentric circles that spanned from a single developer’s account all the way out to the entire world. That feels like the feature flag story to some extent, isn’t it?

Heidi: It is. And Microsoft uses it, and Lyft uses it, and they all have teams that are doing that. And the story is, put it into production, but nobody can see it. And this works especially well at Facebook and Google because they’re using trunk-based development. So, everything is always live in their codebase, it’s just hidden behind different feature flags.

So, they put it out into production—they’ve deployed it—they turn it on for themselves, they see if it works; they turn it on for their team, they see if it works; they start turning it on for beta users. Microsoft calls this ‘ring deployment.’ And that really works for getting stuff out and making sure that it’s not going to be overwhelming in a weird way. And also, it turns out that even though most of our test engineers are amazing geniuses, and we should take them more seriously, you can’t really test how a distributed cloud environment is going to respond, except in that cloud environment.

Corey: So, the question I have, at the idea of expanding out from effectively just the developer’s test account, all the way out to the entire world regardless of who you are, is there an ethical concern here? I’m not trying to wind up putting you on the spot, but the idea of, I want to roll out a test to my paying customers, in many cases. The idea of chaos engineering and running experiments, and breaking production intentionally came out of Netflix, among other places, but at Netflix, the failure mode was on some level, “Okay, someone has to restart their stream in the event that something goes sideways.” That isn’t really the end of the world. But there is still a question, okay, you’re actually slightly degrading a paying customer’s experience. Given that feature flags are seeing adoption significantly outside of the entertainment space where the stakes are almost invariably going to be higher, okay do you stand on the ethics side?

Heidi: So, I feel like most of the life-critical and financial clients that we have, do manage to work with feature flags in a regulated environment because they already have rules about how to do that. It’s not like I’m saying because you can test on anybody, you can test on everybody. So, if you can test on yourself, that’s fine, but you still need an approval to test on any customers. There’s still an approval process. And in fact, we built in a new set of features that allow you to do approvals and say, “Okay, developers can try this out for themselves, and internal people, but it has to go past the approvals board if it’s going to hit any customer effect.” And actually, the thing that I say about this is that we are all testing in production, it’s just that some of us admit it.

Corey: That’s fair. Do you think that there needs to be additional scaffolding put in place before you can do this in an effective way?

Heidi: So, LaunchDarkly provides you a lot of that scaffolding, if you already have some concept of having to be regulated. I think that if you are a new financial startup, I hope that you are consulting best practices on how to set that up. I think that the thing about feature flags is that they are not a pattern, but a tool that can have many patterns. And that gives you the ability to say okay, the way we’re going to implement feature flags is with this extreme change management control, staging environment, soak time, like, we’re going to have all of these safeguards, or you can be somebody who doesn’t have to be that careful and you can move fast and YOLO things into production and see how it works.

Corey: So, on some level, what you’re saying is that the folks that are going to need to build additional scaffolding to do this responsibly, more or less already need to have built that scaffolding already and they’re already behind the curve to some extent.

Heidi: Right, exactly. Because how else are you going to say, “I can auditably and verifiably say that this person got this exact variant of the deployment, of the release?”

Corey: One of the problems I have with the idea of feature flags—that’s right, I brought you on the show on a promoted episode to basically tell you, “You know what your problem is”—and basically berate you for the way your entire product works because that’s how we roll here. But I have to ask, I wonder how I even begin getting started with something like this. I want to go ahead and test it, sure. It feels like I may have to wind up doing two, three sprints worth of work just to get into a position where I can even test something out. And at that point, it almost doesn’t matter what you’re going to charge me for a product or service. The sheer engineering time investment makes it a relative non-starter.

Heidi: Right. So, one of the things that we found is we are so frequently replacing some homegrown solution. So, people have the concept of being able to control how their software operates, they’ve just set it up in some way, and that some way is not scaling. One of the patterns that I’ve seen when people get started is they get started with something that’s not mission-critical because it’s easier to learn that. So, we have a customer called Xero who does a ton of payroll stuff in Australia and New Zealand.

And the way that we got in was, they wanted to use it for their financial transaction stuff but they didn’t want to do a ton of risky messing around with that. So, what they did was they’re like, “Okay, we’re going to buy a small license, and we’re going to use this to control our website. And once we’ve internalized how it works, then we’re going to use it for financial stuff.” And so these small non-mission-critical projects are learning labs for the team and then they can go on and share that knowledge with a more core business value team.

Corey: When we first spoke a few years back, my initial takeaway was, it sounds awesome. I can see the value. But it also felt like you were more or less running into the wind in that first you had to teach the market about the thing that you solved then immediately afterwards had to go ahead and sell them something. That always felt like a very heavy lift. But looking around now, I can see broad consensus in the customers I talked to about the understanding of the value of feature flags, and, “Oh yeah, it’s a reasonable thing that we should be doing.” It seems like in that respect, the heavy lifting has already been done, and on some level, it, “Oh, it just sort of happened organically. That’s just a natural evolution of the market.”

Heidi: [laugh].

Corey: It feels like that may have partially been what you're doing.

Heidi: Thank you. I have worked really hard. It turns out that category creation is enormously fun. I mean, you know, founder of Duckbill, a consulting group that comes in and says, “You’re doing it wrong and here’s how to fix it. And we’re not going to charge you more for being dumb.”It was actually kind of a risky stance to take. And in the same way, LaunchDarkly came in and said, “Okay. We see this need and we’re going to explain to you why your life will be better after this.” And fortunately for me, CI/CD really took off, I think there’s been a ton of great work from the IT Revolutions people, putting out Accelerate putting out Project to Product putting out Sooner Faster Happier. This ethos, this zeitgeist of being able to move faster, and safely, is exactly the group movement that I needed to catch on to.

Corey: It’s a really neat thing to see the natural evolution of a product in a space, going from, “What the hell is this thing?” To, “Oh yeah, it’s a best practice and if you don’t do it, you’re probably doing something that is at least marginally dangerous.” It’s really something to behold and I have a hard time identifying other major players in the space that aren’t you folks. Not that I’m asking you to, because, “Yes, now let’s talk about your competitors,” is never a great look on an episode. But as you mentioned before, I strongly suspect your strongest competitor, the one that we all fight against commonly is, “I’m just going to build this internally myself.”

Talk to me about that. What does that usually look like because very often when I see people building things internally themselves, they don’t quite contextualize it in the context of the thing they can go out and buy; they view it as something different. As I look around the shattered remnants of my build system for the crappy software I build and deploy myself internally, what parts of that are going to look like, hey, that’s a feature flag option but I don’t think of it that way.

Heidi: Right. I think that this is actually a super exciting place for tools vendors to look at because it turns out that not only do people build it themselves in large organizations, they continue to build new things themselves. By my count, Google has at least ten different feature flagging systems.

Corey: Wow that’s almost half as many as they have messaging options that they release and then deprecate.

Heidi: Right?

Corey: I don’t know if we’ve seen the google messaging application for 2021 yet, but I’m sure it’s coming.

Heidi: I’m sure it’s coming. And then we will get attached to it, and then they will kill it off. I’m still salty about Reader so, you know.

Corey: As am I. They are never going to live that one down.

Heidi: Never.

Corey: Nor should they.

Heidi: It was rude. It was very rude.

Corey: It absolutely was. It showed a flagrant disregard for an entire ecosystem.

Heidi: The thing that I find when we’re competing with homegrown is that people are solving the problem in front of them, and it is absolutely true that it will take your engineers less than a week to code up some kind of feature flagging system, but it is also true that we have invested a ton of money in all this infrastructure so that we can serve flags at the edge in under 200 milliseconds around the world; that we have done a bunch of integrations, we have like 23 SDKs now; we have all of these abilities to hook into Salesforce and ServiceNow, so that you can have this seamless throughput and so that people don’t have to leave their native tools in order to use feature flags. And your developers aren’t going to replicate that and they’re not dedicated to researching where we need to go. So, when I’m competing with a homegrown solution I’m always like, “Yes you can build it, but it’s free as in puppies. You’re going to have to maintain it.” And that’s the expensive part of any software.

Corey: Any engineers who build this kind of thing just please skip ahead 15 seconds on your podcast players. Go ahead; do that now. Great. Managers, yeah, do you really want this sort of thing being built by the exact same people who built whatever horrifying monstrosity you’re using to deploy your existing software into production? Really stop and think about that for a minute.

Okay, now let’s talk a little bit about the future. Now, I’m not asking for roadmap information because that’s always in flux and no one likes to pre-announce things, but what do you think the future of feature flags is? Now, that it’s broadly accepted, okay’s it going from here?

Heidi: Individualization. The future is personal. And I think that the thing that we want to be able to do is let people set their own experience of their phone, and their web, and their smart home devices, and say, in all of the ways that we sort of have control now, we get more control. So, I have all of these things that I’m like, “I can change the settings on it, but not as much as I want.” And also, the settings are pre-assuming a bunch of things about me.

So, if I had the ability to do some of the things that we can do in browsers—like you can set your browser text to be something more dyslexia-friendly, okay—But not all web pages respect that. If I could force that, it would be awesome. I’m not dyslexic, but I want that ability, I want the ability to say, this is exactly the experience that I have chosen for myself.

And I’d love it to be portable, I’d love it to be a markup language because I think it’s a real accessibility statement to be able to say, this is what my web experience is like. And I have separated the content from the container. It’s a really old tech-writing concept that the words and how they are presented are almost entirely separated from each other. And in the same way, I want somebody to say, “I’m giving you this web content or this application, and how you present it is up to you.” And breaking that linkage is going to be so empowering for so many people.

Corey: Tell me a little bit more about this. I was worried when you said personalization because, “Oh, good. More creepy tracking of people.” But that’s not at all what you’re talking about. You’re talking about something that winds up transcending devices and sticking with a person, but for, I guess, the power of good rather than for the purposes of, basically, spying on people.

Heidi: Right. So, this is my futurist hat and not necessarily LaunchDarkly, like, end goal, but what I want—I’m not allowed to call it ‘Flag Markup Language’—

Corey: For the obvious acronym purpose—

Heidi: Yeah.

Corey: —of course.

Heidi: But Flag Markup Language follows you around and says, “I never want to see day mode. I’m only a night mode person, and if something appears in day mode, I want you to override it.” That’s like the simplest explanation of it. But it would also follow you around and say, “I’ve reduced the screen width.” Or—here’s an important one—I’ve taken out everything that makes this page very heavy because you are accessing it on a very narrow pipe.

Like, my parents don’t have cell phone service, and their WiFi is, well their rural internet is not great. And so, every time I visit a page that’s not a problem when I’m in the city, it takes a minute to download because there’s all this stuff. And I’m like, what if people could still get their ads through, but they were simple text instead of, like, video animation, based on the size of the pipe that is trying to go through.

Corey: I love the idea, but it feels, on some level, like that also requires broad-based acceptance across the board from every site that they visit, wouldn’t it?

Heidi: It would, or you’d have blockers. What it actually requires is broad-based acceptance by the browsers.

Corey: Got it. That feels like it is simultaneously easier and far more difficult all at once.

Heidi: Well, I don’t think it could be a solo project, but I think it would be a fascinating step forward in accessibility to have Lighthouse run and say, “Okay, but your page is not only inaccessible, it’s also too heavy for people who are on this bandwidth. Do you want a reduced fidelity version?”

Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.

Corey: So, one more question before we wind up calling it a show. You were one of the people that basically got me on board the train of giving conference talks from an iPad. It was transformative. It worked super well. I was really in the swing of it, and then the pandemic hit and I’m not traveling at all anymore and my iPad is mostly gathering dust here.

What’s your experience been on that? Are you still using iPads for any of your digital presentation works, or are you basically putting that on hiatus until it’s no longer taking your life into your hands more than it normally is to go on stage in front of a roomful of people?

Heidi: I think the earliest I will be at a physical gathering is possibly November. And honestly, conference organizers, you’re going to have a hell of a time doing anything this fall. And it had better be single-nation. We are not going to be able to cross borders.

Corey: Oh, absolutely. And beyond that, the first few conferences that rush to come back in person, you’ll be able to take a look at who attends those things and realize whose company considers them expendable.

Heidi: Right?

Corey: It’s a harsh thing to say; it is also accurate.

Heidi: Yeah. It’s really interesting to see how this is going to work out. But as far as my iPad, what I’m actually using it for now is I watch talks, I don’t live-tweet them as much because I’m watching on the iPad, and its multi-screen capability is okay-ish, but not great.

Corey: Adorable. I would classify it as adorable.

Heidi: Yeah, like, points for effort.

Corey: There was a solid attempt.

Heidi: But I will watch talks because it turns out that a lot of what makes me work is getting to listen to engineers, and developers, and ops people. And if I don’t have that input, I don’t have any grist in my mill. I figured out this is why I was having such trouble writing blog posts is because I wasn’t talking to anybody out in the world who was having problems. And I guess that whole developer advocate title wasn’t just hot air because what I really care about is making sure that those voices are getting represented.

Corey: There really is something to be said for what has happened to conferences in the past year. Suddenly, attending a conference no longer requires the ability to travel places, take time off, pay for your accommodations while you’re there, and in many cases, pay the not small fee for the conference. Suddenly, there’s a wealth of content that is available online, universally. And sure, the experience is relatively crappy, but it’s universally crappy. There’s not a better experience for some subset of people who are able to spend more, and the folks who can’t cover that expense are sort of forced into a substandard, degraded mode. This is what it’s like for everyone. And I have a hard time seeing how that is going to continue once the world opens back up. It’s one of the vanishingly few items on the list of things I’m going to miss, post-pandemic.

Heidi: Yeah. I was speaking at a conference based out of Russia. And there was a guy logged in from Kinshasa, D.R. Congo. Now, the thing that nobody really knows about me is I grew up, until I was five, in D.R. Congo. And it had never occurred to me that I was going to get to talk to a technologist from Central Africa because they can’t get visas; money is an entirely different thing.

And now in this new time, all you need is an internet connection to be able to attend. And honestly, I kind of choked up because there are all of these technologists all over the world who have been shut out for various reasons: because they can’t get childcare, because their company won’t pay for it, because they’re not like working for either a very large company or a cool Silicon Valley company. It makes me painfully aware of what we haven’t been providing all along, and I hope that conference speakers and me—this is a thing that I’m working on—will start doing more fixed time, like, here’s a webcast, here’s a podcast, here’s a conversation that enables everybody to access it.

Corey: On the other side of it—again, not from a engineering insight and knowledge perspective—but what I think is a slow-dawning awareness among a few people that I’m super excited about, is they’re watching me, effectively, livetweet slash aggressively shitpost various big company keynotes. And they’re finally realizing something that, wait a minute. Corey is not doing this because he had early access to what’s coming out and is ready to go with it. He doesn’t have special front row seats; he hasn’t been behind the stage talking to people before it goes out. He’s just watching this thing in one window, screen-capturing it as he goes, pasting it into his Twitter client, which is just the Twitter web app, adding some stupid commentary, and whacking send.

That’s the entirety of what I do, start to finish. And I apologize for just using the word stupid; let’s say ‘nonsensical’ instead. Let’s be a little less ablest. That is all it is. And I think that there’s now a slow creeping awareness that, wait a minute, it doesn’t require stupendous amounts of money, or access, or privilege beyond the normal level of privilege that I have in this space.

It’s just the ability to do it. And of course, the tremendous privilege I have of not being able to be fired for the things I say on Twitter, which is a separate problem entirely. But it isn’t because I have this magic ability to reach behind the scenes and grab things. It’s just, I’m doing it because no one’s stopping me from doing it. And I glad to see that other people are starting to take notice of that.Heidi: Yeah. I don’t think you need to be an insider to be a good reporter. And I think that one of the things I’d love to see is for companies to hire some people that they haven’t been thinking about lately. My current campaign at LaunchDarkly is I really want us to hire a librarian. Honeycomb did it and I think it’s frickin genius.

Because we have all of this content, and we don’t have a good information architecture system to find it. And Confluence’s search is… not great. [laugh]. But I want us to hire librarians and I want us to hire investigative journalists, and say, “Look, we are developing so much stuff, but we need somebody to make the connections to make it accessible and usable, to make it something that you can use.” Like, you can absolutely teach yourself programming from YouTube, but where do you start?

Corey: It’s a terrific question. I think it starts with doing it. I wish I had a better answer, just because it’s, “Oh, I just go out and I do the thing that I do.” One thing I admire about you and I’ve always admired about you is you view a primary component of your job as being teaching people to do things. The problem I have is that so much of what I do is an outgrowth of things that have worked for me that it’s not easy for me to teach it because it’s just oh, just be yourself.

Well, most people aren’t themselves, they’re, you know]actually pleasant people. And for me, it’s always been hard to get to that point of being able to articulate what I do in a useful, constructive way. And I am extremely conscious of the danger of, “Oh, just do this thing because it’s what works for me.” That is a perfect recipe to unintentionally create a masterclass in how to be a white guy in tech who is swimming in privilege. Because I am, whether I want to admit that or not. And the things that work for me will not work for someone who is not themselves overrepresented. So, I am very torn on, how do I teach things? What do I teach? What can I effectively teach? And how do I not turn into a monster while doing it?

Heidi: Right. I think actually, the live-tweeting big keynotes is an interesting take because you can be an asshole and nobody is going to fire you. And also, the internet is unlikely to fall on your head because you’re a dude.

Corey: Exactly. My failure mode is a board seat and a book deal. I’ve been using that joke for a long time because it’s not really a joke.

Heidi: It’s not wrong. And I think it’s important to note that every once in a while I pull my punches because I don’t want the internet to fall on my head. And I’m a nice white lady in tech and have axis of privilege along that. So, I think it’s interesting when we’re talking to people who are under-indexed, and we need to be really careful when we listen to that feels unsafe. And one of the things I like about when you’re talking about how you’re funny online is, you talk a lot about punching up and not punching down.

And by sheer odds and percentages every once in a while you get it wrong, and then you apologize and try not to do it again. And I think that’s certainly a model I would like to see a lot more people adopt, where I don’t want people to necessarily be permanently canceled, but I do want them to say, “I caused harm and that was wrong, and here’s what I’m doing to fix it.”

Corey: And sometimes I feel like all I can do is try and lead by good example. And on some level, that’s really what the story about feature flags—and now, that’s a hell of a reach—is. It’s a—[laugh] it’s about setting examples and giving good demos. And that’s what you did. That’s why it became normalized.Let’s stop beating around that particular bush. The reason that it became normalized is because people like you got up on stage, from an iPad, threw up some random website that you’d just thrown up, and said, “Here’s how feature flags are going to work. And here’s what it took to instrument it.” And it turns out, not that much. You can do it in a live demo.
And then there was an interactive approach. And seeing that again, and again, went, “Wow, if you can use this for upscale shitposting mid-conference talk, done live, then what’s my excuse for not being able to do this thing in production?” And the answer is, “Oh, right. I’m bad at things.” But for most people, that’s not the answer.

It worked for them. And that, I want to be very clear, is no small thing. I’m not blowing sunshine up your butt when I say that you have fundamentally changed the way the entire industry thinks about feature flags. It’s true. It really is. One of the unique things about—you mentioned at the beginning—is that you have been at the same place for as long as you have, consistently on message but also dragging the rest of the industry forward, politely. You aren’t doing it with my brash way of getting up there and picking fights. You’re doing it by making people feel good by listening to you.

Heidi: Yeah.

Corey: If people take nothing else from this entire episode—forget the feature flags; forget the how to make your software better—if the only thing they take away is just watching you and using that as a life lesson in how to become a better person than they were, then this podcast, every episode has achieved more than I dream to when it’s set up.

Heidi: Oh. All right, but it’s a sponsored podcast. So, I’m going to say my thing.

Corey: Excellent. Please, by all means, take it away.

Heidi: I want you to feel safe about your software. And I want you to be able to do that while your software is operating in production. And the way you can do that is by being able to control it live in production. And if you can’t control it live in production, and if you can’t commit broken code, you’re not doing CI/CD. You’re doing some kind of hideous mini-waterfall. And so when you’re thinking about all of the things that keep you up at night about your deployment and your production, remember that there’s a better way to do it and it involves finer-grained control.

Corey: And again the whole point of this podcast is, I have a bunch of sponsors who say different things at different times. There is a bar: we don’t have sponsors on this show if I am not convinced that there are people for whom their product or service is the right thing. We have rejected sponsors on those grounds before. But to be clear we also don’t ringingly endorse the companies either. With what you do, and what I’ve seen, I do endorse it. I want to be very clear; and that is not something a company can buy, though some have tried.

Heidi: Well, you know, the sacks of cash that VC gives us are ours to spend either responsibly or irresponsibly, and since John, our CTO, is not getting his kombucha tap anytime soon, I think that what we’re going to do with it is do as much as we can to help people sleep better at night. And also—oh, this is a cool thing that just happened at our annual meeting. I just found out that LaunchDarkly is completely carbon neutral. We’re offsetting our AWS, we’re offsetting our travel, we’re offsetting our office space.

Corey: That is no small thing.

Heidi: And we commit to continue doing it. That’s just part of our budget now.

Corey: Well, I guess that really does sort of throw a wrench into the pivot option of starting to do one of those new NFT cryptocurrency things, now doesn't it?

Heidi: Man, I am so angry about that. I am so angry.

Corey: I want to own a representation of a jpeg, but I also want to burn down a forest, boil the oceans, and wind up effectively using more energy to do it than my house uses in 18 months. What have you got for me?

Heidi: I just don’t understand why this—I don’t. I just don’t understand anything that has to do with, I ran my car on idle for two years so that I could do a sudoku that is somehow fungible. The economics of it don’t make sense to me.

Corey: That is a whole separate podcast that I will get to one of these days. Heidi, thank you so much for taking the time to tolerate my slings and arrows and occasionally ridiculous compliments. If people want to learn more, where can they find you?

Heidi: So, find us at launchdarkly.com. We would love to give you a trial or a demo. And you can find me at heidiwaterhouse.com. And every once in a while, I update my blog but not on any regular cadence.

Corey: Excellent. And we will of course throw links to those into the [show notes 00:37:54].

Heidi: Oh, and the place that I really am is Twitter. So that’s—

Corey: As are we all.

Heidi: Right. That’s @wiredferret.

Corey: All one word, of course. Heidi, thank you so much once again. It is always a pleasure to talk with you.

Heidi: I had a great time. Thanks, Corey.

Corey: As did I. Heidi Waterhouse, transformation advocate at LaunchDarkly. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud if you’ve enjoyed this podcast please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast please leave a five-star review on your podcast platform of choice, along with an insulting comment with an embedded feature toggle so that you can wind up changing it to a glowing comment after it’s already been published.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

This has been a HumblePod production. Stay humble.
Play Episode
All Roads Lead to Kubernetes with Kendall Miller
Screaming in the Cloud
04.06.2021
42 Minutes
About KendallKendall was the first hire at Fairwinds and has been in almost every role in the company. Today he works to establish Fairwinds as a essential name in kubernetes—offering software, services, and open source. Kendall has four kids, a dog, and three weasels. He also co-hosts a podcast on leadership with his friend Rachel at https://authorityissu.es.

Links:Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.

Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Kendall Miller, president of Fairwinds and, due to a lapse in judgment and both of our parts, one of my longtime friends. Kendall, welcome to the show.


Kendall: Thank you, Corey. I’m pleased to be here and continue that lack of judgment.


Corey: Excellent. So, we go back, and we will get into that story in a bit. But I’ve known you longer than I’ve been an independent consultant. You were there in my early formative years as a new manager. And I manage people in interesting ways.


There was a lot of empathy to it, but there was a lot of, shall we say, personality, and you had the good graces not to call me a jerk to my face, in so many words. Thanks. I wanted to make sure we got that in before I proceed to destroy what you’re currently doing professionally.


Kendall: Well, first of all, I appreciate that I also—even right there, I want to jump in with a story that, one time, in San Francisco, at a brewery or a bar or something with like, 25 friends, I had a friend who doesn’t work in tech show up and I was walking around the table introducing who everyone was, and this person works there, this person works there, this is what their title is, this is what they do. And I got to you. And I said, “This is Corey Quinn. He’s… a personality.” And I think that’s what you just described yourself as, and I do think that’s still maybe should be your title instead of cloud economist, just ‘personality.’


Corey: Yes, the problem is that ‘personality’ has a lot of implications to it, most of which are absolutely correct, but I prefer to let people discover that on their own. It was not in a bar or a brewpub. What it really was, was at a Chinese restaurant. And I remember this very firmly because we’re sitting around the typical white tech bros, as we are, surrounded by our friends who are fortunately not all looking like us. And the waiter comes by and you turn to the waiter mid-sentence, and completely switch languages and order in—I believe it was Mandarin, but it may have been Cantonese.


Kendall: Mandarin. Yes, probably.


Corey: Yes. And for the longest time, I had to do a fair bit of research to figure out whether or not that was actual legitimate Mandarin or an elaborate prank that you had staged just to make me fall for this and tell the story someday, I actually went in the back room had them send a different waitperson out who you would not have had time to bribe and yeah, sure enough, you can speak Mandarin. So, that was one in a long series of ways in which you surprise me. Every time I think I’ve got you dialed in, you go in a new direction, and I am forced to expand the ever-increasing multi-dimensional representation I have of Kendall Miller.


Kendall: I like to think that it’s all respect-based. But normally, I go into a restaurant like that beforehand and ask a few of them to speak Chinese to me and to listen to me speak and then tell everyone at the table that I actually speak Chinese when I really don’t.


Corey: Exactly. You know, there are dumber things people could do.


Kendall: [laugh]. If impressing friends, it takes a little bit of preparation, I’m in for it.


Corey: Exactly. So, all of that said, while we’re on the topic of dumb things, you are at Fairwinds, which is awesome. And you’re—the tagline for the company is, “Kubernetes done right,” but I checked the page very thoroughly. And you do in fact do Kubernetes which, from my perspective, is not doing it right. The only winning move is of course not to play. What’s the deal with that?


Kendall: Well, for a guy who spends his life criticizing AWS, if you believe that Heroku is the answer and solution to all things, wonderful. I mean, it does run on AWS, and you—marry those two things for me, what is the perfect solution, Corey? Is it all serverless all the time? Is it Heroku all the time? Because if it’s not Kubernetes, if Kubernetes is not your savior, then what are you betting on?


Corey: Operational excellence is sort of the short answer there. But Fairwinds is an interesting company—


Kendall: Bullshit. Whoa, whoa, wait, wait. Can I cuss on this?


Corey: By all means. By all means.


Kendall: [laugh] so, so bullshit. I mean, ‘operational excellence?’ You don’t need operational excellence if you have a simple enough infrastructure.


Corey: Yes] because if there’s one thing for which Kubernetes is renowned, it’s simplicity.


Kendall: Oh, that’s what I’m saying. You need operational excellence if you’re going to run Kubernetes. But if you’re not running Kubernetes, you don’t need operational excellence. If you’re on Heroku you just need a really good understanding of UI.


Corey: And you also need to have outsourced that operational excellence to someone else, which is not an invalid strategy.


Kendall: Oh, no, no. It’s actually an excellent strategy. If Heroku works for you, never ever, ever, ever leave.


Corey: No, Heroku works super well, credit where due—I’m not being snarky—right up until the point where it doesn’t. That point is further out than a lot of people think it is, but I’m—I have no problem with Heroku. If you’re on Heroku and think I’m bagging on you, I assure you, I’m not. We have built some stuff at The Duckbill Group on Heroku for very good reason.


Kendall: I’m with you. Yes. Well, and I regularly tell companies if it doesn’t cost you too much, and it’s not overly simplistic for your needs, never ever leave. It is a great place to be. But, [laugh] you touch on Kubernetes, and how can that even be done right?


And is that the non-starter for even doing things? It may be for certain people. And for a lot of companies, there is a serverless, or a Heroku, or something that is the right solution because be as no-ops as you can, by all means. Like, design things as simple, on great automated self-managed hosted systems wherever possible. I mean, the beauty of the cloud is you don’t have to turn the machine on and off yourself; Amazon will do that for you—or Google, or Azure, or whoever your third-tier cloud may be, as the case may be—but why not carry that all the way through to, they will make sure the service is up and running for you, they will make sure the connections are working for you?


By all means leverage all the things that you can. It’s just that at some point, companies reach a point where they require the ability to dig into that complexity and be hands-on themselves. And when that happens, where would you send them if not Kubernetes, Corey?


Corey: Well, I do want to call out, first and foremost, that there is a potential perceived conflict of interest that I want to be very clear that we express. I was an advisor to ReactiveOps—once upon a time—for almost a year when you were the president of that company. Then I stopped advising you folks, and it became pretty clear because you looked around and said, “Huh. What’s the biggest problem with the name of this place being ReactiveOps? That’s right, people have heard of it, so we’re going to call ourselves ‘Fairwinds’ without even throwing in a following seas joke to go with it.” And it is, in fact, the same company, correct?


Kendall: It is, in fact, the same company. Yes.


Corey: Because there was some great branding there, the pajama pants at KubeCon that were labeled ‘ReactiveOps.’ Genius. I love that idea. I wish I could steal credit for it, but I can’t.


Kendall: You have said to me—and I’ve thought about this a lot—you said ‘ReactiveOps’ wasn’t a great name, but Fairwinds is also not a great name and you forget about it by the time you get to the end of your sentence. And I don’t think you’re wrong, but I also think it was the right decision to change names. Now, we could have done it better. There’s a number of things we could have handled better on the SEO side. Like, don’t get me wrong; changing a name is complicated, messy, we learned some lessons, the hard way that I wish we hadn’t.


But at the end of the day, internal marketing matters a lot. Like, I think a lot about GitLab. GitLab is a really impressive product. In fact, it’s a really impressive suite of products, but most people don’t know that because the name is GitLab. And they’re the—


Corey: Oh, the fact that it has ‘lab’ in the name sounds like a science project or an experiment that’s still waiting to see its viability. It’s like, “Oh, GitLab. Is that GitHub’s development group?” Like, “No, it’s not. Well, well, kind of, but no.” And yeah, the fact that they’re toying with IPO, apparently, and are a multi-billion dollar company, they still have the word lab in the name.


Kendall: Well, and it’s still heavily focused on Git even though everybody knows it’s all SVN under the hood.


Corey: I want to be fair, Fairwinds—to be fair—Fairwinds is not a terrible name in the universe that contains things like AWS Trainium, or Systems Manager Sessions Manager. There’s always going to be a bad name that’s worse.


Kendall: Someone who names things worse than you? Yes.


Corey: Exactly.


Kendall: No, I appreciate that.


Corey: It’s just, I got to be direct, uninspiring.


Kendall: It is uninspiring until it’s been around enough and it has enough market traction, that it doesn’t matter. I mean, I am a big believer that this was a good decision. And I like working for a company that’s not called ReactiveOps. I was the first hire at ReactiveOps, and I asked in my interview, “Why is it called ReactiveOps, not ProactiveOps?”


Corey: Because ProactiveOps was taken.


Kendall: [laugh] well, and then throughout many, many years—I mean, there’s a reason it was called ReactiveOps. And it was a great name, it was the right thing to be called for a while. We were able to get the domain, we were able to grow to a certain number. Fairwinds, we had to buy the domain—it wasn’t free, right—and—like, the same way ReactiveOps was because it was catchier, even if it’s—has problems. But the beauty is, we can grow into it, and it can be anything.


And internally, we are allowed to think of ourselves as anything, inclusive of being an ops company, but not exclusive of everything else. Does that make sense? That’s why I really think it matters is the internal naming really matters a lot because people are affected by internal marketing. It’s hard to think outside the box.


Corey: They absolutely are. It seems like a weird juxtaposition because, credit where due, while the name is uninspiring, the company, in fact, is. The people I have met who work there have been nothing short of stellar in every case. It really set the model—to be direct—with how I wound up staffing The Duckbill Group. Like Fairwinds, we are full remote and we’re built that way from the beginning, not having it bolted on after the fact so you have basically two tiers of employees, or remote in the way that, surprise, everything’s now remote because of the deadly pandemic. No, no. We were full remote in the before times, as were you.


Kendall: Yes.


Corey: And that really led to some interesting conversations and some amazing hires you, quite frankly, otherwise would never have been able to get.


Kendall: Yep, agreed. Well, so I’m a leader in this organization; I know all of our warts inside and out. And there’s no such thing as a company that’s firing on all cylinders and perfect in every way. Although I’m pretty damn proud of where we are, and what we’re doing, and who we’re doing it with. And I tell people in interviews, we don’t hire everyone.


In fact, it’s a difficult job to get, but if you make it through the process, you’re going to like the people you work with. I can almost guarantee that. And to a person, we have a great team of people that are compassionate, that take care of one another; it’s an inclusive environment; there are no stupid questions. There was, one time about two years ago, where somebody said something passive-aggressive in Slack, and the company response was actually just laughter, throughout. I mean, just people DM-ing this around, just, just howling with laughter because nobody ever says something passive-aggressive in Slack. We have a culture of respect and I’m proud of that. So, we do a lot of things, right. The people that work here are one of those things. Very much.


Corey: And as I always said, the best way to run Kubernetes is not to, but if someone forces me to deploy Kubernetes, there are really two options. The one that I would prefer would be to go with you folks. I’ve seen how you run this stuff; it just makes sense. And it covers some of the reasons that people run Kubernetes, but not all of them.


Kendall: Well, so Kubernetes is hard, but part of the reason Kubernetes is hard is because it’s still new to most people the same way that moving from a Windows machine to a Linux machine is hard because you’re not familiar with Linux. And in the early days of Linux, you spent all of your time just trying to get it to work, right? Trying to make sure your screen actually had the right driver installed and had all the right settings. And I mean, it was a huge pain in the ass, I spent a lot of my childhood just trying to get different Linux distributions to work on my old Tiger Machines computer because that was entertaining, just trying to get it right. But you don’t want to have to do that with a production environment for a product that you’re running.


And so if it’s new, and the new is complicated, yeah, just look for help; we offer that help. But now, I mean, we’ve really changed a lot, Corey, even since you worked with us where we were heavily focused on services, and now we have a software product that gives people confidence they’re using it right. So, rather than go hire the experts to make the problem go away—please go hire the experts. Make the problem go away—but if that’s not going to be what you’re going to do, install a piece of software—I mean, we have an open-source solution out there called Polaris. It is widely adopted in the Kubernetes ecosystem, especially the open-source ecosystem. You run this on your cluster and it tells you things you’re doing well, and things you’re doing wrong, and it gives you a score.


And then we’ve built on top of that, including a bunch of other open-source tools heavily focused on security and policy enforcement, et cetera, so that large-scale enterprises can actually roll out Kubernetes with confidence because their engineers don’t know what they’re doing. They are giving people the ability to deploy things into Kubernetes that are horribly, horribly configured unless they have good policy in place and a software tool that enables that enforcement. Because at the end of the day, the reason this exists is we can build great infrastructure for people, but if what people are deploying into that infrastructure is terrible, it only gets you so far. And Kubernetes is difficult like you’re saying, but it doesn’t have to be if you got the right team behind you or the right software to help you. And that’s the end of my plug. No, it’s not. I’m probably going to say all the things [crosstalk 00:13:19].


Corey: Oh, of course—oh, you’re going to be self-promotional the whole way. If not, frankly, you’re not doing your job. But let’s be serious here. I don’t disagree with anything you just said. In fact, I endorse it. The problem I have is with the fundamental conceit of the entire argument, which is that people are attempting to use Kubernetes to get actual work done instead of dicking around. It seems to me that the reason that a lot of folks are going with Kubernetes is because they can’t pass Google’s interview but still want to cause play as a Google SRE.


Kendall: So, it’s resume-driven development? RDD?


Corey: Exactly. There are three or four great reasons to run Kubernetes and five thousand terrible ones. And it’s very often it feels that it is incredibly hype-driven in many respects because every time I tend to see it—that’s not fair. Most times that I see it in the wild, and I start talking to the people who have rolled it out on why you’re running Kubernetes. It goes back to talking points that do not ever tie back to an actual business constraint or problem that they were faced with. I mean yes, if I’m trying to run something hyperscale and I need to make sure that no individual system or rack or even data center could take down that service, yeah, something like Kubernetes makes a hell of a lot of sense.


But I’m trying to run a WordPress blog here and baby seals get more hits than this thing does, some weeks. So, for me, it is stupendous, stupendous overkill. But I see things that are about my level of complexity running in Kubernetes all the time, or let’s be fair, they’re not running in Kubernetes; they’re attempting to run in Kubernetes. Change my mind.


Kendall: So, well, there’s a couple things there. Is it hype-driven? Absolutely. But a lot of the hype is deserved. I mean, when our company was founded, when ReactiveOps started, we set out to build a framework for Infrastructure as Code, and we wrote a shit ton of Ansible and a little bit of Terraform, to go solve the problem of having automated deploys, blue-green deploys.


You know, everybody wants logging, monitoring, alerting, a system for their cloud. Everyone’s needs in the DevOps space are all the same. How they accomplish them is a little bit different. So, we wrote a framework. Again, tons of Ansible.


Kubernetes comes along, and we took a look at it, and it was a lot better. There’s a lot of things that does it just make sense? Is the API different? Is it complicated? Yes, especially if you’re new to it. But honestly, the same way, Corey, that you might spin up a simple Linux instance on Linode, or an AWS to go kick the tires on something or spin up a simple server, that’s easy for you because you’ve lived in the Linux water for a long time. And once you get familiar with it, it doesn’t take a long time. Same thing with Kubernetes. Should most people be deploying WordPress onto a Kubernetes cluster? No. [laugh].


Corey: Absolutely not. I’m hard-pressed offhand to come up with a worse idea.


Kendall: No, it is a terrible idea for so many reasons. But if you live in Kubernetes world, or you’re very familiar with it, or you want something to fiddle with, which is a legitimate reason to kick the tires with Linux is because you want something to fiddle with or Kubernetes, it’s a thing that you can go fiddle with; it’s a thing that you can go learn. The paradigms are new, they’re exciting, it’s fun. This is the way that the world’s going. In the future, all the Herokus of the world, every PaaS is going to be underlied by Kubernetes, every service you’re using is going to be Kubernetes almost everywhere, except for the few places where it really doesn’t make sense.


And I don’t think we’re that far away from that. Should you use the PaaS? Yes. But if you need a PaaS that you’ve built yourself, use Kubernetes. It’s the closest thing we have to a foundation or a framework for cloud infrastructure.


Now, that said, it’s really not a foundation. It’s somebody giving you rebar and cement and saying, “Good luck, buddy.” Right? But if what you’re doing with that rebar and with that cement, you can build a really impressive foundation that’s going to meet your needs for your very, very, very custom-built house. If you have a small house, a small family, no big needs, don’t buy a custom house.


If you just need something simple to live in, don’t buy a custom house. But if you’re a large enterprise, and you need to have dramatic control over all the different things and you want it to be a little bit flexible, Kubernetes does a pretty darn good solution, Corey. Change my mind.


Corey: You’re right. The fundamentally—


Kendall: No, what? No. Stop. We can just end the recording right there.


Corey: Oh, where. We’re just—cut it there. Good. We’re done.


Kendall: [laugh].


Corey: You’re not wrong on a lot of that. And the argument that I see is that you wind up with two sides girding themselves for war, you have the containerized side—which we can distill down to Kubernetes because regardless of what many of us wish happened, it is basically winning in the space—and the other side is, ah, serverless.


Kendall: You—wait, wait. You want a Docker swarm to win?


Corey: No, no. I personally ECS, if you—I still maintain kubernetestheeasyway.com and I have re-pointed it to the ECS product homepage, I will re-point that to the highest bidder.


Kendall: ECS is going to run on Kubernetes. More and more. It’s all—


Corey: Oh, yes. We’ll have that argument some other [crosstalk 00:17:56]. But there’s serverless on the other side—


Kendall: Yep.


Corey: Which is, you just wind up using a bunch of high-level managed services, pay for consumption. And the old-school admins are all very angsty about this. At that point, you’re just handing your availability over to your cloud provider. Well—


Kendall: Sure.


Corey: —no, you’re just being honest about it because you’ve been doing that for 15 years.


Kendall: Absolutely. And yeah, I mean, serverless is the absolute—okay, not the abs—I’m sure there’s going to be things that iterate on serverless but in the old days of, I have a computer running my server in my data center, or honestly, not even my data center. I mean, the startup I worked for in 2004, we had a back room, like, literally a closet with a server rack in it. I’ve taken this server with this install of this operating system and all of the things it takes to run my app, and I’ve given it to the cloud on an instance that now I have to manage in the cloud. And they just continue to abstract those pieces away to literally, here’s the workload; make it happen, Amazon; make my problem go away. Brilliant. Way to go cloud. Way to go serverless people. I give credit all the way back to the Fission.io folks, which I think were Platform9. I don’t think Platform9 talks about that much more, anymore.


Corey: I keep mistaking them with Plan 9. Talk about derivative names. But please, continue.


Kendall: [laugh]. There you go. Well, but—so, I mean, it makes sense. It’s brilliant. The reason to use Kubernetes isn’t because you have a workload you don’t want to worry about. The reason to use Kubernetes is because you have to have fine-grained control over some of the internal networking, some of all the different—you know, I need this to scale up this way, and that to scale up that way, and I need them to talk to each other in this way, and I need to have this control over that thing.


And should you use serverless? Yes. If you can make the whole thing work in serverless, yes, just do it. But in a few years, all the serverless everything is going to be running Kubernetes underneath, and that’s what I’m betting on. So, I don’t care if you run in serverless. Somebody is running that serverless system and it’s probably running on Kubernetes and they’re going to want help.


Corey: The problem that I see with a lot of this, too, is that okay, fine. You’ve convinced me. I’m going to run Kubernetes. Now, okay, and how did you say finding each other? Oh, they need to add something Istio or Envoy or—don’t correct me on that—and something else in front of it.And then I pull up the Cloud Native Computing Foundation’s landscape. And some wit on Twitter just took a screenshot of that once, and tweeted it with a caption of, “Jesus Christ.” And it got something like 20,000 retweets because it’s hilariously overwrought. I look at this, and it makes the AWS service listing look reasonable. It’s that complex, and vast, and broad.


And there’s an entire universe contained within the things you need to responsibly run Kubernetes. And I look at it, and my entire position on it is, the hell with this. I can go back to running VMs on top of a cloud provider—or instances or whatever you want to call them—in a standard three-tier architecture, and that worked pretty well back in 2012. The world hasn’t changed that much.


Kendall: Well, so this is—you can blame the CNCF for some of this. Why did they create a landscape that literally includes everything? You want to submit something to the CNCF, you basically can; you have to sign a couple of agreements. But then it makes it look like all those things are the things you need. I mean, this goes to your tweet, just, like, yesterday, or the day before where you complained there is no enterprise Kubernetes distribution that excites you.


OpenShift is overfraught. Tanzu is complicated and it’s hard to understand. And Anthos is just a SKU of a whole bunch of Google products. I get it. I mean, we have something similar. So, we run Kubernetes at scale for lots and lots of companies, mostly leveraging open-source things. There is a finite number of things you need to go from Kubernetes to production-grade Kubernetes, and we have those packaged in a thing, on our website, in GitHub. It’s called Fairwinds Elements. It’s all open-source. Just go use those things. You don’t need more than that. If you need more than that, go get help. But there is a finite list of all the things you need to go from click a button, get Kubernetes to, click a button, get production-grade Kubernetes. And it should be easy, and nobody’s defining it easily.


Corey: It just feels, on some level, like Kubernetes is really aimed at people who want to cosplay as cloud providers themselves.Kendall: That’s like saying Linux is disguised as cosplaying people who want to… I don’t know, run servers. I can’t, I can’t finish that. [laugh].Corey: That is exactly what it’s for. It’s for people who want to run servers. That’s the problem with Linux as a culture.Kendall: Yeah, well, so I’m just saying like, yes, it’s fixing the need. Now, here’s the question that I have, though, Corey. Talk to me about this. Google bets on Kubernetes—and there’s some debate about whether Google bet on that or the people who founded Kubernetes bet on that. But Google internally is still using Borg.


Talk to me about that. Why have they not bet on Kubernetes? Is it because of all the things you’re saying, that Kubernetes is overcomplicated and Borg is actually the solution, and we should be open-sourcing Borg as-is?


Corey: Borg, to my understanding, is so deeply baked into how Google does things internally, there’s no way it could ever see the light of day. And I also have it on good faith that Kubernetes being open-sourced is perceived as a strategic blunder internally at Google because once it’s an open-source project, they are discovering to their detriment that they can’t deprecate it.


Kendall: But why have they not then bet on it, or at least dogfooded some way significantly, internally? When I talked to a Google engineer, and I ask them about Kubernetes and they say, “I don’t know Kubernetes. I don’t know anything about it because I use Borg.” How’s that not a problem?


Corey: It’s a massive problem. It’s Google had such an advantage with being the home of Kubernetes that they are excitedly squandering as fast as humanly possible, from my perception.


Kendall: I mean, it’s amazing seeing the other cloud providers catch up to GKE because it wasn’t that long ago that we told every client GKE does it better. And there are—


Corey: Oh, my god. EKS was a punchline.


Kendall: [laugh]. I mean, we handle a lot of workloads on EKS now, and it has come a long ways, and it is a completely fine solution for the vast majority of people. And yes, for a long time, it was really, really, really painful. But it’s not anymore. They’ve caught u—I mean, not caught up, but they’re pretty darn close and honestly, sufficiently.


Corey: Incidents happen fast, but they don’t come out of nowhere. If they’re watching, your team can catch the sudden shifts in performance, but who has time to constantly check thousands of hosts, services, and containers? That’s where New Relic Lookout comes in. Part of Full-Stack Observability, it compares current performance to past performance, then displays it in an estate-wide view of your whole system. Sign up for free at NewRelic.com and start moving faster than ever


Corey: They’re not bad, I will say. At this point, there is no way in the world I would want to run Kubernetes myself on top of bare metal. That sounds like pain. I’d want to get some form of distro around it that doesn’t come with a team of seven people wearing suits trying to sell it to me. That’s the wrong kind of distro.


Kendall: But that’s all the fun of Kubernetes. You’re taking away all the fun of Kubernetes. Sorry, keep going.


Corey: I really am. But I want someone to run it for me. I don’t want to think about it. I get some crap for this sometimes. Someone thought that they were pulling a big aha moment that lastweekinaws.com runs on top of—duh-duh-DUH—GCP because they looked at what was spitting out. And my response was a polite form of, “Yeah, no shit. I pay WP Engine to run WordPress for me because I’m not irresponsible, and I honestly, past that, I don’t care where they put it.” I have so many other things in my life that I care about more than I do that. So, what’s it matter?


Kendall: If there’s anything that shouldn’t run on AWS, it’s Last Week in AWS Corey. I mean, the managed service is great, but that’s the thing is it doesn’t matter how great EKS is if everybody’s deploying terrible things into it, that are horribly insecure, that are set to use terribly way too many—you know, are requesting way too many resources and therefore costing you a fortune. Have I come full circle to, “Buy Fairwinds Insights?” Am I allowed to do that on this podcast? Because I feel like just plugging—


Corey: It’s all about the guest here. By all means, knock yourself out. I’ll talk smack about you on a separate podcast like—Kendall: Deal.


Corey: —at some point I’m going to go through all the previous episodes, get them all lined up and do a mega episode for an hour and a half, “And now I contradict all the crazy horseshit that my previous guests have said, in one conversation.”


Kendall: Yes, well, you’ve been on my podcast and I just want to say that if you do that, I will go back and do the same thing to you. And I have way fewer listeners than you so it’ll work out great for both of us.


Corey: That works out well because—they say, what is the collective noun for white guys is a ‘podcast?’


Kendall: That’s, that’s, yeah—


Corey: Yeah, the collective noun for developers is a ‘merge conflict.’ But, you know, we all take what we can get.


Kendall: I think my favorite comment like that was, “Where do podcasts come from?” And it was saying, “Well, when two white guys like their ideas very much, dot, dot, dot…” and that’s really stuck with me. Well, so anyways, Corey, we’re coming up on time, I think, from your side. What not Kubernetes should we be talking about?


Corey: It’s adorable you think I’m not going to cut the hell out of this. We’re at minute three, Kendall.


Kendall: Oh, you’re totally going to. But I want to talk about something not Kubernetes-related. What are you working on at Duckbill Group that’s driving you crazy right now that you can share, or is really exciting to you that you could share?


Corey: Oh, the things driving me crazy? Talking to people like you. My God. I mean, I thought that would have been obvious.


Kendall: [laugh]. I’m the most delightful thing in your day-to-day.


Corey: It’s a growth year. We’re looking at expanding the audience; we have some things we’ll be launching in the near future. Nothing to disclose on that right now. We’re toying with expanding in different directions. One of the things that I’m setting for myself is that if we do any more newsletters or things of that nature, I’m not writing them. I don’t want to put more weekly toil on my plate. I can write well, or I can write a lot, but it’s hard for me to do both. Consistently.


Kendall: You sit and read through the AWS blog for a living, which sounds like literal torture. Well, so let me ask you this. You’re a personality, going back to my first story, right?


Corey: Jeez, you come on my show and insult me. I don’t get that very often.


Kendall: I—hey, [laugh] if I don’t insult you on your own podcast, am I actually your friend? I feel like you would think, no. [laugh].


Corey: No, no, it’s fine. Beating the crap out of me is kind of my thing. I’m like, basically the personification, you know, of AWS marketing.Kendall: That’s right. I mean, I want to ask about this. How has being a personality paid off for you because it’s led to you being able to start a business. If Corey Quinn was a nobody when you start Duckbill Group, it would have been a lot harder to get your wheels off the ground, it would have been a lot harder to hire people. You have a brand that’s allowed you to build a company and in a lot of ways that not having a brand wouldn’t do. I mean, can you talk to me just for a second about how beneficial it is to have the brand that you have?


Corey: Uh, it’s a double-edged sword like most things. It’s nice to be able to go out there and tell a story and people are like, “Oh, you’re the guy from whatever.” It does get super hard when no one has heard of me, and it’s, “So, what do you do exactly?” And it’s, take a deep breath, and rattle off the newsletter, the podcast, the consulting, the Twitter shitposting, et cetera, et cetera.


Kendall: That’s why you just tell people you’re a personality. Keep going.


Corey: Yeah, that happens, but—and it is helpful, but it also means that on some level, it’s—this is going to sound weird—it’s very lonely. Everyone’s sort of engaging with a persona, where it’s—and they have this idea of me rather than me as a person. Like, everyone knows me, I have remarkably few friends. It’s a very strange mixed bag, there.


Kendall: I mean, it’s something that I have spent time thinking about, that the complexity of being known is that people come up to you at an event and they want to be in proximity to you, to say that they were rather than to say, “Hi” because they know you know them back. And the larger that percentage is of people who know you that you don’t know—or that ratio is—the more complicated that gets, I can see that as being lonely. I’ll make sure that next time I see you in person, I give you a big hug.


Corey: Oh, good. But as long as the pandemic is over, it’s fine. The other side of it, too, is that you get used to scrutiny a lot. Everything I say is controversial to someone, and it’s differentiating, someone getting upset because I did or did not use an Oxford comma in a tweet—which, frankly, is not an important battle worth fighting. Don’t email me—and the other side of it, which is someone gets upset because I refer to a group of people collectively as “Guys,” which is valid because that’s something that is exclusionary to folks who do not see themselves encapsulated in the term guys. I get it. I eradicated that word from my vocabulary and replaced it with folks and people can deal with it.To all the way on the other end of the spectrum, which I’ve never actually had to deal with of, “Wow, your views on race are incredibly problematic.” So, regardless of what you say, or what you do, you’re going to get scrutiny, you’re going to get feedback and disambiguating into where on that spectrum any bit of that feedback falls into of can I safely ignore it because it’s irrelevant, or am I just thinking that because growth is painful, I don’t want to go through that? And are some of the ways that I perceive things actually regressive? It takes time and a commitment to improving, but it’s not easy because you get a lot of feedback. And if you’re not careful in moderating that and taking it to heart and evaluating it on its own merits, it can destroy you.


Kendall: Well, what’s interesting about that is it almost sounds like you had to reach a certain level of fame to have the normal level of scrutiny imposed upon, say, your average woman on Twitter.


Corey: Absolutely. Absolutely. And even now, let’s be fair here, I don’t have anywhere near that level of scrutiny directed at me even now.


Kendall: Sure. Yeah, that’s interesting. And does it give you more empathy, though, for people who make their living in the Twittersphere, that don’t look like you?


Corey: I don’t think I ever was missing that to begin with because I’ve have conversations with a lot of folks who have far more valuable things to say than I ever will and who are, frankly, better people across the board. So, I’ve always been very aware of that. And again, it’s uncomfortable becoming aware of the privileged one carries and that was something that was a definite—it takes an adjustment like anything else. I used to be very different when it comes to my views on these things than I am today. And it just, it takes empathy, it takes walking a mile in someone else’s shoes, and it’s transformative because once you see it, you can’t ever unsee it.


Kendall: Yeah.


Corey: And frankly, at this point, I wouldn’t want to.


Kendall: Yeah. Well, and it’s interesting because now we’re both in positions of power in our organizations, like, actual titles of authority—


Corey: Oh, yeah. I have an authoritative position in the industry, and you have an authoritative position because you’re one of the only people who have gotten Kubernetes to boot up and get the errors to stop scrolling.


Kendall: [laugh]. But it’s the authority in the industry that sets you apart there, too, and it comes with a weight that I know you’re aware of, and I’ve seen you—I mean, one of the things that I like about you, Corey, is I’ve seen a friend call you out for something, you asked a bunch of clarifying questions to understand what it was about what you had said that was wrong, and then you went and removed it because you humbly understood that. And I mean, frankly, that’s a big deal, Corey, not everybody does that. So, if you’re going to be a celebrity, at least carry that weight with a little bit of humility, which now I’m on your podcast brown-nosing. Which, if we can just wrap up, maybe—[laugh].


Corey: No, no. That’s much more expected and normal. We’re used to that. I can handle that.


Kendall: [laugh]. If we can just scroll back now and insert that, you saying, “You’re right. You’re right.” And then just end right there. That would be ideal, probably. Is there anything else you wanted to talk about, Corey?


Corey: No, it’s it’s—you’re the guest, I should be asking you that. Anything else you want to make sure we cover?


Kendall: [laugh]. Um, gosh, what else is going on in the world? I mean, I think it’s really fascinating watching the speed at which Azure is advancing. I think it’s increasingly proof that… I think there’s a lot of ways you can argue Google has some of the best engineering solutions in some of their cloud products. They’re the best—


Corey: Oh, yeah. Just ask them.


Kendall: Well, they’re the best solutions for some of the wrong problems. AWS is willing to build anything, even if it’s the wrong solution, as long as there’s a market for it. And Microsoft can just sell. In fact, it was a Microsoft person who asked me about my different opinions on the clouds, and I was telling them where I thought AWS and Google sat in the market, and they said, “Our only differentiator is that we can sell. We’ve been selling to everyone for forever, and we’re going to continue to be able to sell to everyone for forever.” And it is fascinating to me watching a cloud grow with the speed that Azure is because they have the Rolodex that they do. Nobody has that Rolodex. And that’s fascinating to me. I mean, how long until you launch Last Week in Azure?


Corey: Oh, it exists. When it hits enough subscribers and people care, I’m going to find someone to run it.


Kendall: Oh, wow. Okay.


Corey: I don’t want to keep it. My god. I’m just building the list because enough people will care. lastweekinazure.com. Sign up.


Kendall: Oh, wow, interesting. Okay, there you go. I didn’t know. I didn’t know. But I’m not surprised. You should, you should be there because, at some point, there’s going to be meaningful competition to AWS. And it looks like it’s coming from Azure, not DigitalOcean.


Corey: I would agree. But I don’t think that that market needs to be served by me. I think it needs to be someone like me in that space. I am not going to become that person. And that’s okay.


Kendall: It’s a different kind of snark to attach to Microsoft than it is to attach to Amazon, given the—


Corey: It’s a different audience.


Kendall: Yes.


Corey: It’s a different language in many respects, and there are people who could be much more authoritative in those customer relationships than I can.


Kendall: Yeah, I believe that. Interesting. And do you see any third-party or second-tier or third-tier cloud catching up, ever? Is somebody going to enter the space and make waves? It seems like it’s a little bit too late. It doesn’t seem like Oracle is going to catch up, or DigitalOcean is going to take over.


Corey: Well, yes and no. DigitalOcean and Linode are both doing interesting things. I mean, take a look at them. They’re not shrinking. 
Everyone likes to say, “Oh, they’re just withering on the vine.” No, they’re not. They’re everywhere.


Kendall: But they’re not going to catch up either. They’re never going to be number two to Amazon, are they? Or—s I mean, that’s what I’m asking. Will they be?


Corey: Yeah, and isn’t that a sad fate that will only make hundreds of millions instead of many billions in a given quarter. I mean, that’s not a terrible life, from my perspective.


Kendall: It’s true. It is interesting how we measure those things where Google will kill off a product that has more revenue than the vast majority of startups do in their first ten years of business, but it’s such a small number compared to them, they’ll just shut it down. Not to pick on Google, who is infamously shutting things down, but lots of business units that do that in the Apples, in the Googles, in the Amazons. But that’s interesting, the way we measure that.


Corey: There are many paths to success. And I don’t think that it needs to be measured in the context of the GDP of a midsize country.


Kendall: Yeah, yeah. I agree.


Corey: Duckbill won’t get to that kind of revenue for another ten years. That’s okay.


Kendall: Yeah, well, and you’re going to experience an interesting thing, being a bootstrap company who’s trying to make money. And everyone who has venture money around you is going to look down their nose at you, which is a weird thing that—


Corey: And that’s a serious problem if VCs don’t like me. I mean, that—I don’t know what I’m going to do if I wind up in that position. I mean, I need the wisdom that only comes from winning a lottery once and then being able to tell me how I can win a lottery, too, someday.


Kendall: I mean, there’s some nice things about being able to leverage VC money and grow really fast. I get it. I think what’s amusing to me is when a founder backed by VC is looking at a person like you who’s growing a company profitably and thinks to themselves, “Wow, I’m way better at burning money than this guy is at earning money.” And that that somehow gives them an air of superiority. That’s, that’s the thing that amuses me. But our industry is a weird industry and everybody’s all the time trying to size themselves up compared to the next guy. And—


Corey: Oh, I’m an old-fashioned crotchety old man here because I have the kind of business model our grandparents would have understood.


Kendall: [laugh]. It’s true.


Corey: It’s like, “So, you haven’t—where’s your investment all come from?” It’s, yeah, it’s this magical thing called revenue and profitability.


Kendall: Yep, yep, yep.


Corey: Because honestly, I’ve got to be direct here. If I am solving people’s AWS bills and losing money in the process, I don’t think that I would be qualified to do the thing that I do. It’s similar—no joke—back in two years of re:Invent being an in-person thing in Las Vegas, I never would gamble when I was there because I didn’t want the optics of, “Isn’t that the guy that’s supposed to be really good at saving mon—
understanding large, complicated money things sitting at a slot machine?” It’s just the optics aren’t terrific.


Kendall: That’s hilarious. I’ve never thought about that. I’ve been at a re:Invent with you, and I don’t play slot machines because they bore me, as does most gambling, but it never occurred to me that you had the—


Corey: Yeah, if I want to look at flashing lights and get endorphin hits by pushing buttons, that’s what I have Twitter for.


Kendall: [laugh]. That’s right. When somebody hits ‘like.’ The thing is that you have to reach a certain amount of inertia before you get the endorphin hit that you need from Twitter. That’s why so many people fizzle out before they get a reasonable following.


Corey: Credit okay due, it took me seven years to get my first 1500 followers, which is what I was when I launched this place.


Kendall: Yeah, that’s impressive.


Corey: I finally cracked the secret of Twitter. And guess what? Ready? Here it is: be funny. That’s all it is. The end.


Kendall: I mean, is it even that? Doesn’t it show up all the time, and being funny is like a nice to have?


Corey: Okay, be funny frequently. There we go.


Kendall: [laugh]. Be funny, frequently. Yeah. I buy that. That works.


Corey: So, if people want to learn more about what you’re up to, and actually maybe see if your company can solve a real business problem they have, where can they find you?


Kendall: So, the company is Fairwinds. That’s Fairwinds.com as in, “The winds are fair,” because this is Kubernetes, and everything is nautically themed. See, Corey, there’s more to the name than you thought.


Corey: There is. And people want to keep up with you personally because they make the same terrible series of choices I do, okay can they find you?


Kendall: My Twitter handle is @blatanterror as in a mistake that was very obvious. And I also host a podcast on leadership, primarily highlighting people who come from underrepresented backgrounds in tech. And the podcast is Authority Issues. That’s authorityissu.es if you want to check that out.


Corey: Upon which I have guested, and vastly enjoyed the experience. The host, not so much, but I did.


Kendall: Well. That’s why I have a co-host is so I don’t have to be in your shoes in this situation and come up with all the clever things. I mostly just ask questions, and then when I’m having an off day, she carries the load for me which is delightful.


Corey: Excellent. Well, thank you once again for joining me. I appreciate it, despite what you may think.


Kendall: Thanks for having me, Corey, and I’m a little disappointed because if you didn’t appreciate it, I think I would enjoy the spiting you a little bit more. Spiting the professional spiter.


Corey: Kendall Miller, president of Fairwinds. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you didn’t enjoy this podcast, please leave a five-star review on your podcast platform of choice along with a comment explaining how that despite cutting this episode down to five and a half minutes, somehow Kendall still managed to irritate the living piss out of you.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


This has been a HumblePod production. Stay humble.
Play Episode
Minimum Viable Bureaucracy with Laura Thomson
Screaming in the Cloud
04.01.2021
37 Minutes
About LauraLaura Thomson is Vice President of Platform Engineering at Fastly. She is also a member of the Board of Trustees of the Internet Society. Previously, she spent more than a decade at Mozilla, leading engineering and operations teams, and was on the board of Let's Encrypt. Laura has spoken at many conferences worldwide over the last 20 years and is the author of best-selling software development books.

Links:
Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.

Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.


Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Laura Thompson, VP of platform engineering at Fastly. Laura, thank you for joining me.


Laura: Thank you for having me on.


Corey: So, Fastly is generally considered to be one of the definitive names in the CDN world. Is that an accurate description of what Fastly is, or is it perceived internally as, “Well, we are a CDN, but there's also much more to it than that.” I always want to make sure that my understanding of the company isn't based upon things that are no longer completely true.


Laura: So, we like to describe ourselves as an edge cloud platform because we've gone beyond the CDN. We are now shipping these products like [email protected]—which is essentially serverless only really, really fast because it runs at the edge—and a suite of security products. So it's more than just a CDN. And I think we're becoming more and more general as cloud platforms go.


Corey: Credit where due, I had a customer a while back who was using Fastly as a CDN and had been using a bunch of these edge compute things, and they said, “Oh, yeah, we can't move that off on to a competitor at this point because of all that logic that's there.” And I said, “Oh, so it's a locked-in story?” And they looked at me as if I were simple and said, “No. Because it's awesome, and nothing else is quite like it.” “Well, yeah, I guess there is a certain lock-in story around building something awesome that other people haven’t.”


Laura: That's actually really interesting. We've been talking about—this the product folks that work because the perception that I have is that people choose a particular cloud provider—whether it's general compute, or whether it's CDN, or whatever it is, edge cloud stuff—for the unique features. For the most part, people aren’t really interested in the lowest common denominator stuff, unless they have a very straightforward use case, and people have less and less straightforward use cases over time. So, they want to use the unique features; they want to know what's special, what can I only do here, and that's the basis of their purchasing decision a lot of the time.


Corey: Absolutely. Any sort of cloud comparison analysis between which vendor we're going to pick that breaks down to, “Well, what size of instance is going to cost how much?” And then trying to equate that, like, that is so far from the relevant part of the story when you're doing vendor selection in a modern era.

Laura: Right, exactly. It's interesting for me because I've been on the other side of the desk a lot. My last job at Mozilla, I was partially responsible for making CDN buying choices, so I have pretty good insight into what goes into that. It's interesting to think about.

Corey: It really is. In fact, to that end, you were—in fact, you are currently a member of the board of trustees of the Internet Society. What is the Internet Society, first off?


Laura: So, the Internet Society is about making sure that people have access, equality of opportunity, and that standards are kept open. So, the Internet Society essentially funds the IETF, which is responsible for most of the standards that run the internet. And that is sort of the primary mission.


Corey: I feel like on some level, the default response to that is, “Wait a second, the internet has standards? Since when?”


Laura: [laugh]. Right. Well, it does. I mean, TLS and HTTP would be the two things that people would think of, I would expect. And if those were not standardized, we wouldn't be here having this conversation.


Corey: Exactly. For those who didn't grow up in the ’80s ’90s, et cetera, watching these things evolve, the things that we take for granted today, like different networks can talk to one another all comes down to interoperability around shared standards. But I remember the dark ages, where if you were on CompuServe, you couldn't talk to the people who were on AOL directly. And over time, those walls started breaking down, and a lot of work of the standards bodies like the Internet Society and the IETF are the reason why. It's one of those boring governance things that happens underneath the hood so no one has to think about it. But the reason no one has to think about it is because people like you are doing the hard work of making it that way.


Laura: That’s right. And standards is a thing I've been passionate about for a long time. It is one of the reasons that I was at Mozilla for so long. It's one of the reasons I'm at Fastly, which has, I think, a similar role on the other side. You know, if Mozilla is thinking about how do we make clients use open standards, Fastly is thinking about how do we keep the internet open, and standard, and useful, and safe for everybody? 


And [ISoc 00:06:00] and IETF are obviously, in the position of figuring out how things talk to each other. And it seems like maybe this is a solved problem, except it's not. One of the things that's happened on the internet—which as you can see over the last few years is that we're actually getting less and less open. So, we have more walled gardens, we have the internet being dominated by a few really large vendors, and they have a lot of power and standards. If you have the vast majority of the market share in a particular vertical, then you get to dictate what the standards are, and you can do that in a fairly anti-competitive way. I'm deliberately not naming any names here. 


Corey: Oh, absolutely. I wouldn't expect you to, but I sure can. There's a reason that I have a podcast here where I control the RSS feed that I can point wherever I need it to live; there's a reason I have an email newsletter; there's a reason I don't have a Facebook page, to be very direct, where it's not about de-platforming in the censorship sense, but more along the lines of if I have built an audience on a platform, and then that platform decides that its business strategy is going to shift—something like Medium, for example—then suddenly, I am beholden to the whims of that provider unless I want to start over from scratch again. That's why it's always been so important for me to build my audience on something that was much more agnostic, not because I'm posting garbage that should be taken off the internet—well, most weeks—but rather because I don't want other people making my business decisions for me.

Laura: Yeah, so my interest in the internet is not about de-platforming, it's about platforming. It's making sure that people have access and the ability to build awesome things as much as they can.


Corey: Absolutely. But you're also a VP of engineering, which I feel like if I had asked you a year and a half ago, “What does it mean to be a VP of engineering?” You would have given me an answer. And now I feel like if I asked that same question in these uncertain times, or these unprecedented times, depending upon framing, you might have a different answer from that. Is that accurate?


Laura: Yeah, I think it's definitely been—I’m really tired of the word unprecedented. It's been a year. [laugh]. It's been a heck of a year. 

Corey: It really ha—it feels like it's been longer than that.


Laura: Yeah. So, I started at Fastly on the ninth of March, 2020, which was a week after Fastly closed the offices for COVID. So, I have never set foot in an office, other than in the interview process. And I haven't met almost any of my coworkers. It's super strange, and it's really hard to build those trust relationships without having met anybody. 

So that's kind of the first thing, and obviously, you would have a different experience if you'd been there and seeing the change, but this is all I have known in this role. So it's actually pretty interesting. 


Corey: I know that there's a lot of zeitgeist awareness around what it's like to be an employee in a pandemic now and being full remote and the rest, but something that doesn't get discussed very much is, how does leadership change when suddenly you aren't able to gather people in a room and have a conversation and hash something out when everyone's remote?


Laura: Right. So, the funny thing is, that part hasn't changed very much. And I will say—for me, I've been a remote employee for about 15 years. And this is not what remote is normally like. I think it's part of the, sort of, important takeaway for people who might be doing it for the first time. 

There's a big difference between choosing it and having it thrust upon you, for one thing. When I give people advice on how to work remotely successfully, it's always about, have good boundaries, have a dedicated space, and make sure that you start work at a particular time, finish work at a particular time so that you can walk away and de-stress. And because a lot of people are kind of trapped in the houses, they can’t have as good boundaries. They might have a one-bedroom apartment in San Francisco, and it's all in one room, and their spouse is there trying to work as well, and maybe they have a dog and a baby. It is not a typical remote working experience. 

Similarly, it's not the typical remote leadership experience because remote is great. It also is important to meet with people once in a while, and not for meetings—not all sitting around a whiteboard talking through a bunch of dot points--but for the part that is really hard to do remote, which is the relationship-building. To me, you are trying to get to know people so that when things go wrong, they say, “Oh, I know Laura. I know what she's like. I trust her to do this right.” 

And when I’ve never met you, it’s a little harder for people to do that; everybody has a little work a little harder. And when I say a little harder that's in the face of all of the cognitive load that we're already under because it's been, as you say, a heck of a year.

Corey: Right. And there are a lot of negative examples about how all this stuff looks terrible when people get it wrong. For example, “Oh, you're not spending an hour commuting every day; now you can spend that time working.” Or, “We have a policy of turning your webcam on, which is a fancy way of saying I'm inviting myself into your home so I can critique it. Excuse me, it's not acceptable that your infant is crying.” At some point, you hear these stories, and it's the biggest gap we have is the ability to strangle people over the internet now because that's horrifying.

Laura: I know. We have done a lot, I think, to make that comfortable for people. I think part of it is, Fastly has been at least sort of, half remote for a long time. [unintelligible 00:10:50], like remote-first companies are good to work for I will say, in general. But making sure that people know that it's okay if you have to have your camera off, or if you have to have a baby on your lap. 

In fact, the thing I found is there is nothing quite like a meeting that has a cat, or a baby, or a puppy. And it lightens the mood, it helps people talk to each other. So we're all surrounded by our emotional service babies, and cats, and dogs. It's quite funny, really. I'm going to tell a story, which is—people talk about CEOs, and we have a CEO named Joshua Bixby, who has a really lovely human being. And when the pandemic started, he started a weekly meeting where he would read picture books to people's kids.


Corey: That's an amazing idea and in fact, I’m debating stealing it and claiming I came up with it myself, except that we're doing this on a recording, and suddenly my team will know when this comes out.

Laura: Yeah. I just thought it was an incredibly nice thing to do. And making sure that people knew that we're all in it with our families, we're all stuck at home with kids and whatever, and let's try and make the best of it and get to know each other a little bit better.


Corey: On the one hand, I absolutely agree with the sentiment and the place it comes from. On the other, I've got to say I have an anti-authority streak in a big way. My single biggest stumbling block, along with my personality, back when I was an employee. I mean, my last job was at a regulated finance company. You can imagine how well that worked out for me. 


But authority is a problem for me, so whenever I hear, “Oh, go ahead and bring your family into this social gathering we're doing,” my immediate knee-jerk response is, “Is this required?” And that's not helpful as far as the sentiment goes, but it was there. That was my initial flare-up reaction. And I'm always hyper-aware of—now that I managed before myself of being sure to never present as anything other than if you want to.


Laura: Yes. Yes, it's very much been that way here. I too have an anti-authority streak, which is a funny thing to say when you end up in leadership. But that's why, by the way. And I think sort of compulsive joining-ness annoys a lot of people. I'm not one of those people that it annoys. I like hanging out with other people. I'm really extroverted, which may surprise you in an engineer and someone who chooses to work remote, but I like people, right? 


I like hanging out with people; happy to hang out with everybody, but I know that not everybody wants to and that is 100 percent okay and you have to be so clear that that's okay. Everybody is kind of walking their own road through this thing. And for some people it’s… [sigh] some people are actually pretty happy being on their own, doing their own thing, so that's pretty important to notice as well and not try and invade people's privacy, and keep it strictly work if that's what you need. And if you need to, sort of, get to know people, then that's okay, too. Part of, I think, being able to lead well is to code switch to what people need. So, one style of management doesn't work for everybody. You have to figure out what works for each individual person and work with them in that way.


Corey: Impedance matching, almost. I periodically reference on this show a boss I once had, who, as I described him, spoke only in metaphor. Where—

Laura: Oh my.


Corey: —that's great. I don't understand what the hell you're talking about. Should I be doing more of this thing or more of that thing? “As the boulder crashes down the mountain through the stream”—it’s like, “Okay. I'm sorry, go write haiku in your own time. I'm trying to figure out exactly what needs to happen. Am I doing well? Am I about to be fired? It'd be really nice to know where I stand with you.” And I never got a clear answer.


Laura: Wow, that's rough. It's funny how those things work out. I once had a boss who had very little in the way of facial expressions. Just the way that their psychology worked; I would venture to guess that they're probably not neurotypical. And at the beginning, I found this, like, super intimidating. 


And then I figured out that it didn't matter if you couldn't read his face because he would tell you exactly what he was thinking. And without emotion; it would just be all very factual. But there was no sketching around the issue, there was no trying to figure out what he meant. He would just tell you. And that was actually very relaxing once I figured that out. It's strange because if someone had said, “Would you like to work for someone like this?” I would not have said, “Yes.” But it was actually kind of refreshing.


Corey: There's something to be said about having a sense of—I know people talk about this a fair bit—of psychological safety in employment. And you need that sense of psychological safety, but you also need a sense of job safety. I know that even now, four years into running my own company with my business partner, whenever one of us sends the other a message of, “Can we talk?” And then goes quiet, it's, “Oh, am I about to be fired?” is the instinctive, immediate reaction every time. Even though neither one of us can be fired, it's still—that's a trauma that leaves scars.

Laura: It’s actually really terrifying, I think. I totally agree with you. And having had to do that, reasonably often. When you just need to ask somebody a question. And I think, I end up putting something in Slack that'll be like, “Hey, I need to ask you something quickly, but this is nothing bad. Don't worry, this is not a scary VP coming to be scary.” I find myself qualifying it like that because it's not my goal to make anybody's heart rate spike. [laugh]. They can watch a horror movie if they want that, but they probably don't. Nobody needs any extra stress right now. 


Corey: Exactly. 


Laura: Making sure I only inflict stress intentionally, and that's very rare.

Corey: So, one of the things that you spoke on, back when conference speaking was a thing—was a periodic focus on ‘Minimum Viable Bureaucracy,’ and as I mentioned, for someone who has a problem with authority, just the very phrase is appealing. Tell me more.


Laura: The reason I started with Minimum Viable Bureaucracy is that I, too, have a problem with authority. I have a problem with process. I'm really goal-oriented, and I don't really mind how we get there. And it's been hard for me to learn as an adult that you actually do need some process. So, figuring out what the balance is of what is the level of process that is the smallest amount required for things to run smoothly, to be predictable, without driving all the people that you work with bonkers. Because there's a lot of engineers who don't like authority, but there are people who also need process, so what's the balance? And there's a few basic principles to that. 

One of them is, first of all, to push decision-making down to the edges so that people who know the most about something can make a decision about that thing. That's really important to me. A second principle is that you should always iterate on your process like you would on your code, or in your infrastructure, or in your products. You don't expect to ship v1 and walk away and never improve it; you don't expect to ship with version 1 of your config for your infrastructure and walk away and never improve it, but we tend to get stuck on process. If you are doing something and it seems terrible, then stop and don't do it; do something else instead. I think the ability to change something on a day-to-day basis, to iterate quickly, essentially, to continuously deploy the way that you work is super important to happiness.


Corey: There's also the risk aversion approach where whenever something breaks into a particular way, “Ah. We're going to add a process in to make sure that never happens that way again.” And keep iterating forward, and eventually, you're at a point where there's six thousand things that need to be verified at every point and it becomes unwieldy. It becomes almost ossified and mired in that process.


Laura: Yeah, that's absolutely true. And knee-jerk process is the worst. I think we actually have a really good team here that does incident management, and part of their job is to figure out, well, what parts of this actually require a change? Other changes we could make, other changes we shouldn't make, and to be really strategic about that. And it's super exciting to work with them on that. 


Having said that, I think you can target things that need process. And the way I always say this is you should make the boring things boring, and that covers everything from the promotions process: everybody should know how it works. “How do I get promoted?” Everyone knows. It's straightforward. 

It's the same every time. Maybe there's some iteration, but in general, it's well understood. It's true for deploying code as well. It should be boring, it shouldn't be exciting. I want the exciting things to be exciting, like the new thing that we're shipping, or the new tech that we're working on, or hiring somebody awesome. And I want the things that should be the same every time to be almost invisible.

Corey: You're the engineering leader I'm not, so you will almost certainly have a way better take on his than I will, but it feels that some organizations approach process and procedure from a position of if we just put enough process around this, we can finally not have the creative expensive types do these jobs, and instead just wind up turning it down to someone who follows a script and that's it. Is that the actual intention? Is that how it just manifests, or am I completely missing something fundamental?


Laura: The thing you've described is not valueless; there's one thing I think it's important to note, so I'll come back to that in a second. But in a place where you don't have any process—everything is terribly artisanal and bespoke and so on, you end up in a situation where you can only have really, really senior people working there, people who have the experience, and judgment, and know how to improvise. And what you have done then is you've made it impossible to have junior engineers, or interns, or people who do process. So, you can go too far the other way. And, to me, it's super important to have up-and-coming people, like people who are relatively new to the industry because they bring new ideas and also, who's going to run it when we all retire? 


It's super important to have junior people coming up. And if you have, sort of, absolutely no standard ways of doing anything, it's going to be really, really hard for them to be successful. So that's actually perhaps a really strange reason for having processes, but that's one of the reasons. It's certainly not unreasonable to say, “Let's have the expensive people do the hard things.” Like that's certainly one way of thinking about it. But it's also so that not everybody has to be stressed out by not knowing how things work all the time.


Corey: And that's very fair. I want to be very clear: here at The Duckbill Group, we fixed the horrifying AWS bill, and originally everything was bespoke because it was just me as an independent consultant. And, yeah, I can keep it all in my head. Why not? As we started hiring people, we built out processes and procedures around how these engagements go, how the analysis looks, but it's still nowhere near the point where someone who's not conversant with the relevant technologies, and the relevant terms of art, and the relevant financial strategic requirements of businesses would be able to perform effectively in that. 


So it's the, let's get some standards around here and let's make sure that we're not missing things, but it's also never been aimed at driving down what it takes in order to deliver an engagement successfully to a point where we can start having fundamentally unqualified people in those roles.


Laura: Or people that you're trying to train. You want to be able to have an apprentice, or a Padawan learner, or whatever you want to call it.


Corey: Oh, absolutely. Every person we've hired into this role has gone through a training, and onboarding, and upskill approach because no one else does this quite this way. But there's foundational prerequisite knowledge that works. I mean, it’s—the idea of what it takes to operate in large-scale environments is a key example here. You can't teach that without giving someone a high-scale environment to work within. And it turns out, we don't have a lot of those right now.


Laura: Yeah, that's really true. There are some things you can try and there are some things you can't try, and some things are harder to learn than others, I think, too. And not just scale. Because scale, as long as you can get somewhere that has it, you'll pick it up. You'll have to. 


There are some things that are much, much harder to learn. And one of those is, I think, really good troubleshooting. A great way to learn that is by shadowing someone working or working with someone who's really good at it, as long as they can talk through what they're doing. Someone who's really good at troubleshooting and can communicate. Another one is, sort of, responding to ops situations. 


And I think—what I mean there is incidents, and outages, and firefighting. That's actually kind of an interesting thing. I remember I once toured a fire station in Atlanta. It was actually a heavy rescue station. If you don’t know the difference between firefighting and heavy rescue, firefighting is putting out fires, heavy rescue is running into burning buildings to pull people out. 


And so I had asked this fire chief, “What makes someone really good at heavy rescue?” And he said, “Someone who thinks the day we get to run into a burning building is the best day of their life.” And some people in ops are like that. Some people love it. Like, it’s—


Corey: Oh, the adrenaline hit of the firefighting? Oh, yeah. I'm right there with you.


Laura: Yeah, I love that stuff, and some people find that incredibly stressful. That's one of the things I'm not even sure that you can learn it, by the way. I mean, I like to think you can learn anything if you set out to, but it might not be good for you to do it [laugh] honestly. If you find it incredibly stressful, maybe you should take something more—a little further from the fire. Dispatch or something, or R&D. But, you know, some things are hard to learn. And that's one of them.


Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.


Corey: What do you think is changing? Is it even still possible to have that shadow approach in a virtualized environment like we're all working within now? I mean, having someone shadow in the olden days was a tried and true method of getting someone up to speed on various engagements. It's changing now. And I don't want to ever be the kind of company that can't manage to hire junior people: “Oh, everyone here must be super senior.” Great, then where does the next generation come from if that's your approach?

Laura: Right, exactly. I think it is possible. One of the things that makes this interesting is—you know, I have an 11-year-old daughter, and she's doing remote schooling right now. And it is interesting to me how much better at learning without being in person, people who have grown up doing that than people who have grown up doing it in person. One of our friends make jokes about, “Haha, digital natives.” 

And I'm like, “Yeah, that's it.” So, I see this kid sitting on a call explaining to an adult, walking them through how to set up that Discord server. Which I think is awesome, by the way. I'm super happy with that. But they don't see anything weird about it. 

So, I think that is a thing that, the longer you spend doing it, the easier it becomes. It is hard to visualize how that works if you've done it in person your whole life, but the longer you spend doing it, the longer you start figuring out the tricks. To me, it is harder because you can't necessarily tell when somebody is struggling. So, there has to be a certain level of trust, and building that is probably the hardest thing. Comes back to trust, again.


Corey: Every company claims that they've nailed this. “Our employees love us. We have high trust among our staff.” And it holds still and it makes sense right up until the point where you talk to their staff directly.

Laura: Yeah. I think that's really true. It's really a hard change for people who haven't done it before and who wouldn't choose it. There's a lot of people who have had it thrust upon them this year.

Corey: So, a recurring theme of this show has been where does the next generation come from? And it's pretty clear that Fastly has done a phenomenal job of finding and recruiting extremely capable senior talent. But what are you doing to bring up the next generation? Because it's clear you don't just hire senior people, which is good. I'm just curious how you wind up developing those folks into the same level of amazing that some of your senior folks are.


Laura: So, we actually have some programs here I'm really excited about. To begin with, we hire people from all different backgrounds: we hire them from boot camps, we have people that are self-taught, we have people who have PhDs in computer science. But internally, we have a couple different programs that I am excited about. One is that we have a system where folks who work in our customer support teams can actually do a rotation through engineering where they can work in engineering one or two days a week for a few months. And if they like it, then they can transfer over and work full time in engineering as a junior engineer. 

And that's been a really successful program, we've had a number of graduates from that, and some of the most awesome people on our teams. I’m really excited about some of those folks. The second thing, which is a new thing that we've just started trying, we have a kind of a weird team here called resilience engineering. That's something that I started that I'm super, super happy with. I’ll talk about that briefly, and then I'll talk about the apprenticeship program. 


So, as you pointed out, we have a lot of senior folks. One of the things that's often the case with senior folks when you hire them because you—you know, they work on a standard, or they're famous for a thing, is that they tend to be specialists. People who know the most in the world about some technology X, whether it's some kind of network thing, or TLS, or whatever it is. And we didn't have a lot of folks looking at the whole system end to end: what are the weakest parts of the system? How can we make it better? 


What can we do to make it more resilient? So, we started this team called resilience engineering. And it’s an interesting team. It has a bunch of really senior folks on it; some of them are pretty well known. But one of the things we thought about was that we weren't really training any new systems engineers that way. 


So we've actually just rotated in our first junior engineer—well, they're not junior. They're an engineer, engineer—and I think this will be a good pathway for them to work their way up to being a principal engineer by looking at every system at Fastly, and understanding how things work, and understanding that the complexities that you get from having complex systems with huge scale. They don't necessarily behave in predictable, deterministic ways, it has emergent behavior. And there's that old story about the engineer who—knowing where to tap: this is all about knowing where to tap. So I'm pretty excited about that program. 


Yeah, we're trying new things. We do not currently have an internship program, but that's something that we would like to do in the next couple of years. They’re all approaches we're getting to get junior people in.


Corey: Internships are always hard because, on the one hand, it apparently has to be about education. Two, if you're bringing interns in and not paying them, don't do that.


Laura: Oh no. Don’t do that.


Corey: That’s garbage. [laugh].


Laura: Don’t do that. 


Corey: Yeah.

Laura: Yeah. Don’t.


Corey: Oh, that's not for you, that’s for a couple of companies who are feeling shame when they hear this, and they should because that's monstrous. But a lot of companies view intern programs as a backdoor recruiting funnel—which is fine, thrilled to do it. Until I watched them try to talk people into dropping out of school their final year and come into work full time instead, which feels a little weird. I've got to admit.

Laura: Yeah. No, absolutely not. Try really hard not to do that. So, we talked about starting one this year, but we didn't because of COVID. I was pretty involved at the internship program at Mozilla, so let me talk about that and some of the things we did there because I'm pretty proud of the work that we did there. 

Corey: Wonderful.

Laura: So, obviously we had college interns, and we made some changes to that program because for a while we had gone after, sort of, Stanford people and MIT people, and you know exactly what I'm talking about, right? And the thing that we noticed was that there's a certain lack of diversity in folks like that, which is probably completely unsurprising to you. So, we started looking for interns from a different set of colleges and universities, and that was incredibly helpful. So, we did some deliberate recruiting to historically black colleges and universities, to colleges that had lots of professors that were interesting open-source and things like that. And those things were actually just a way to get some different types of folks. 

We did another thing, too, which was a non-traditional internship program, and this was for people from any background. One of the people we had that came into it was a chef, previously. And the criteria for application were separate for each particular internship. So, for some of it might be: to apply for this internship, write us a piece of documentation; or, to apply for this internship, write a test for this. So, it was sort of a very open-source approach to things. 


And some of the people we got through that program were incredibly brilliant folks from really unusual backgrounds. There was one particular person that I worked with who is one of the smartest engineers I've ever worked with. Generally, you hire somebody and you are constantly blown away by their insights and how hard they work. 


Corey: I'm very fortunate to be able to say, “Yes.”


Laura: Yes, exactly. And this person had no background in computer anything. They had a background in mechanical engineering, believe it or not. They were also from a non-traditional background; I think the first person from their entire family to go to college. And they had actually been building museum exhibits for science museums, which is a really, really cool thing to do, by the way, but obviously nothing to do with programming. 

The problem with those kinds of jobs is a lot of them are seasonal or contract. And they were looking around saying, “Well, I’d really like something a little bit longer term, and it seems like jobs in tech seem to have those criteria, so I will go and apply for this internship and along the way, hopefully, I will learn how to code,” and ended up being an incredibly brilliant engineer who built some really amazing things. So, yeah, I'm really open to bringing people in. My goal in working with people on the internet is for everyone to have the opportunity to build things they’re passionate about. And not everybody starts from the same point of opportunity, so finding different paths for people to get in is really important to me. I have kind of a non-traditional path at some point in my career, so I’m very empathetic to that.

Corey: I think that most people who have thrived in their career can look back at various points in their career and specific people that they reported to and say, “That person had a profound impact on my career.” Ideally, they'll even say that in a positive way, rather than negative, but I guess my goal has always tried to be one of those people that they say that about, someday. And it's easier said than done because the payoff is far in the future, you'll never know if you succeed, in some cases ever, and in other cases, not for another 15 years. But it's a good aspirational way to aim for, at least in my fumbling attempts at management. What tips would you have for folks who are aspiring to either become managers at all, or—in other words—become better managers than the ones that they had inflicted upon them?

Laura: So, I think the hardest part of any kind of leadership is managing yourself. Before you attempt to manage other people, you have to try and get a handle on yourself. And by that I mean let's imagine that you're at work, and you are really stressed. If I am an individual engineer, let's say I'm really stressed about something going on in my personal life: maybe one of my relatives has COVID, and I'm freaking out about that. Oh, I don't know how I'm going to manage childcare this year, there's so much going on, I can come up with a million examples. 

And maybe my work suffers. And maybe my manager comes to talk to me about it. Now, if I'm a manager, or a director, or a VP, I come to work, and I have that leaking out all over the place, it is likely that that gets taken out on the people who work for me. Or at least, if they see that you are freaked out about something or if you're angry about something, people always jump to the worst conclusion. It’s the thing you mentioned earlier, “Can we have a quick chat about something?” 


It's that if I see that my manager is super, super grumpy about something, and maybe writing a grumpy email, or the tone is not there, it has a profound negative trickle-down effect. It's not to say that you can't be human and can't have feelings, but you have to have an emotionally mature way of dealing with things. And that's really hard, by the way. Like that's non-trivial, obviously, but I encourage people, if you want to be a good manager, make sure that you have somebody to talk to about it. And that can be your boss; it can be trusted peers outside work; maybe you maybe hang out a social Slack. In a normal year, maybe you would go to a conference and have a cup of coffee with somebody. But I think it's really important to have ways to let off steam that do not involve your employees because it's just not fair to them.

Corey: Yeah. One of the hardest lessons for me to learn was that you can never complain to your directs, or in some cases, your peers. And that becomes a very difficult thing, especially when you're working in companies that aren't aligned in all the ways you wish they were, where there are things you aren't allowed to tell your staff, there are things that you think that your staff is right on, but you're not allowed to communicate in various directions because politics always strike you down. It was never a game I was particularly good at, and my approach was ultimately not to play, which. Really it's not winning; that's abdicating. What's the old line about office politics, where, you're not opting out, you're forfeiting?

Laura: Oh, ow. Ow. That's really painful, but yes, you're right. I think the second piece of advice is related to this, which is that I think you should try to be as transparent with people as you can. And when you can't be, it's okay to say, like, “I can't talk to you about this,” but being transparent by admitting that you can't talk about something. 

Like, “There's something going on here; I can't talk about it now.” That's one set of things. And the other thing is to be really direct, if you can. I tell folks that I work with that my goal for communication is to be kind, direct, and prompt. So, be direct, say a thing that needs to be said, even if it's not really a positive thing, but be kind about it; there's no need to be a jerk about it, especially if it's negative feedback. 


And to be prompt. So, if there's something that I need to tell you, like, “You just did a great job with this podcast, Corey,” I should tell you today. On the other hand, if it was, like, “Corey, you were a complete jerk. Why did you speak to me like that?” I should tell you today. I shouldn't wait. Three months later, until you've done the thing 20 times.


Corey: Oh, my God. The annual review or whatnot. It’s, “Well, eight months ago, you said something dumb, and we're going to ding you for it.” It’s, “What? What is this?”


Laura: Yeah, exactly. And that's all about managing your own psychology, too because it's really easy for people to want to avoid conflict. But learning to have constructive conflict is an incredibly important skill here, too.


Corey: Oh, it absolutely is. Thank you so much for taking the time to speak with me today about a wide variety of different topics all tied back to engineering leadership in some form or another. If people want to learn more about what you're up to, where can they find you?

Laura: Probably the best way to find me is on Twitter, which sounds terrible, doesn't it? But blogs, and websites, and things all have fallen by the wayside because I have had less and less time to think about anything longer than 140 characters. So, you can find me on Twitter @lxt

Corey: And I presume you're hiring as well?


Laura: So we've had a very busy year, and as a result, we're hiring a lot of folks. So, please reach out if you're interested in working here.

Corey: Excellent. And we'll of course put links to that in the [show notes 00:35:40]. Thank you so much for taking the time to speak with me today. I appreciate it.


Laura: Of course. Anytime. It was lovely talking to you, as well.

Corey: Laura Thompson, VP of platform engineering at Fastly. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment saying why a CDN isn't necessary and even if it were, you could build your own in the course of a weekend.


Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.


This has been a HumblePod production. Stay humble.
Play Episode
A Hop, Skip & a Jump to State-of-the-Art Network Analysis with Matt Cauthorn
Screaming in the Cloud
03.30.2021
38 Minutes
About MattMatt Cauthorn oversees the ExtraHop Security Sales Engineering, and enjoys studying the intersection of business and technology. Prior to ExtraHop, Matt was a Sales Engineering Manager at F5. He’s a passionate technologist and evangelist. He holds an MBA from Georgia State University and a Bachelor of Science degree from the University of Florida. Matt speaks at industry events, has been featured on podcasts, and quoted in industry coverage.

Links:
TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.


Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.



Corey: Welcome to Screaming in the Cloud I’m Corey Quinn. One of the problems with being me is that it gets kind of lonely because I stand sort of squarely between the worlds of business and technology. You’d think they might be the same world; they’re kind of not. And one way that I tend to make that isolation a little bit more bearable, is to talk to other people who are in similar positions. This episode is promoted by ExtraHop which is a network security vendor that we’re going to dive into because my guest today is Matt Cauthorn, who’s the VP of Security and Cloud at ExtraHop. Matt, thank you for joining me.



Matt: Yeah, thanks for having me, Corey. Good to be here.


Corey: So, ExtraHop was one of those companies that I became aware of as something to pay attention to. And it’s going to sound weird and obnoxious that I don’t even care, but the reason that I started paying attention was because there was an event in the before times here in San Francisco, and I started seeing your name on the side of city buses. The company, not yours personally; when you see a person’s name on a bus, that usually is a different implication.


Matt: Yeah, I have a feeling it was one of several events that we were involved in. Yeah, it’s great. It’s great that you discovered it that way.Corey: Say what you will about advertising like that: it works. And the problem you run into, in some cases, is that you aren’t able to really convey the depth and intricacy of what a company does. Now, you folks have been a sponsor for a while of my nonsense. And thank you for that; that shows that someone is making excellent decisions on your side. They should be promoted and make more decisions just like that one.


But for those who haven’t been paying attention to the world of security, and all the various nonsense that I do, what is ExtraHop? What do you folks do over there, other than buy advertising on buses?


Matt: So, the technical category that we fall into is network detection and response, which effectively means sophisticated network security analytics for the enterprise in the cloud. And if there’s a network where we can see the packets and process them, we are able to give very, very sophisticated security analytics on that, as well as support for the incident response workflows, and APIs, and much more.


Corey: I’m going to put the shoe on the other foot for a minute here. Whatever I start doing significant sponsorship work with a company, I like to vet them and figure out that okay, is this something that isn’t, you know, complete crap because I’m not necessarily endorsing every sponsor that comes through, but at some point, when you wind up having a sponsor who is next to things that you’re doing, over a long enough timeline, you start to become associated with them. 


And the problem with security vendors in many respects is they almost invariably start speaking to security folks who are steeped in that world where a CISSP almost feels like it’s a prerequisite to understand what’s going on. That’s one of the reasons we launched the Meanwhile in Security companion podcast, to specifically cut through that mess. But for me, the way I learned is by rolling something out and using it myself. And I did deploy ExtraHop to my test environment. And I was pleasantly surprised by what you folks have built.


Matt: Oh, thank you. I’m glad to hear it. And you’re exactly right. So—and I assume your test environment or your environment was up in one of the cloud providers, is that correct?

Corey: Yes. AWS because that’s the one that I have the most experience with; our day job is helping companies fix the horrifying AWS bill, and we sometimes discover how that breaks by incurring one ourselves from time-to-time because it’s a problem that stalks all of us throughout the course of life.


Matt: Yeah. I’ve been on the dubious receiving end of that billing as well in another life. So, you’re doing a service; thank you for that. So yeah, one of the big things that not everyone is familiar with, and I think many, many more if not everyone who’s delivering apps and services in the cloud, is that now, in particular, AWS and the other cloud service providers can send network traffic to target interfaces. 

And that means vendors like us can process that invoke behavioral analysis on these byte streams and give you transaction analysis and security, forensics investigation, and detection. It’s a very, very powerful—and it’s purely out-of-band, native to cloud. So, the way you’ve deployed is using the native facilities up there, and it works really, really well and it’s a wonderful adjunct to a nascent security strategy or a very mature security practice.


Corey: The way that I wound up contextualizing it is… I started off as a grumpy Unix systems administrator, hands-on hardware in the worst ways possible, and I spent a fair bit of time dabbling as a network engineer as a part of that. In fact, during the financial crisis, back in 2008, I was stuck in a job because no one was hiring. I’d been there a year; there was no real advancement opportunity; there was a salary freeze, so as I hit my one year, I wasn’t able to get a raise. And that led me to be even more disgruntled than I normally am. So, my approach to becoming a better systems administrator was to get a CCNA during that timeframe.


And it sounds counterintuitive, but the more I understood what was going on in the network, the more the rest of the system made sense to me, to the point where now when I start trying to diagnose weird issues, I start from a network-based perspective. The problem is so much of that in a cloud environment is obscured away and not easily discoverable.


Matt: Yeah, the beauty and danger of the cloud, simultaneously, are the layers of abstraction that you just described. That’s is exactly right. On the winning end of it, you get this radical acceleration of traditional infrastructure, deployment, workflows, deploy, destroy, all of this stuff, but the price that you pay is that these levels of abstraction take you, sort of, further and further away from having your finger on the pulse of the environment. 

And the ultimate—I’ll just wear out the metaphor here, Corey, but the ultimate connective tissue is the network itself. And in fact, that’s where the preponderance, at least, of the actual behavioral intelligence lies, it’s on that connective tissue. And so without having real awareness of what’s happening on the network itself from a behavioral analysis perspective, you really are kind of flying blind.


Corey: What I want to talk about, too, is that to just give folks an example of what’s happened. In fact, while I have you on the recording, I just pulled up a view into what’s going on in my environment, and it tells me all kinds of interesting views. And honestly, this is one of those visualizations that I wish more companies would discover because, let’s be very clear here, what you’ve built is actually beautiful and a pleasure to use. It almost feels like it’s conference-ware where it’s designed to look good in demos, rather than actually be usable, except that having played with it a bit, it is in fact usable. And it distills down to the EC2 instances that are in the environment, it tells me what’s talking to what, on what port, any sudden spikes, any anomalies, and then it highlights a bunch of different rules here.


And I’m seeing all this from a purely network perspective. Now, that’s great. You can talk to folks about all kinds of tools that do this stuff. All right, so effectively, you’re implementing Wireshark as a service. Okay, that is certainly a way to think about it, except it’s being captured by a VPC mirroring; there was no configuration required on the instance itself; it’s something that can be done account-wide.


It’s something that can be enforced via SCPs within AWS organizations; it’s something that is not, no matter how thoroughly I subvert the EC2 instance that this thing is running on, even if I subvert the entire AWS account itself, as long as I haven’t been able to lateral into the management account for the AWS organization itself, you can’t turn this off and it shows up the truth that lives on the wire.


Matt: Yeah. I love the way you said it. And so I’ll add to the Wireshark metaphor here in a moment, but you’re exactly right, Corey. One of the strengths—and I would encourage like all the listeners—and you’ve got a very broad listener base here, so there’s a veritable mix of different skill sets and folks at different parts of the organization, this is all fine. But I would encourage everyone listening to think about the role of network visibility as it relates to your application and service delivery. The network has a couple of unique—several unique properties. One of them is what you just described: it’s very, very difficult to evade; and it’s very difficult to turn off, and it’s very difficult to manipulate.


Corey: And if the network isn’t working, effectively no cloud service is either. “Oh, it’s doing an awful lot of calculation. Good for it. If I can’t talk to it, what’s the point?”

Matt: Exactly right. So, what we’re doing here with the modern era of analytics, and the state-of-the-art changing so rapidly in the last 10 years or so for network analytics, think of millions of concurrent Wireshark sessions happening with the subsequent expert analysis and behavioral intelligence, with behavioral security detections layered on top. And then if you need to investigate one of those detections that you’re seeing right now, Corey, you click through, you see the asset involved, you see the transactions themselves, that surface to the conclusion that the system came to. And so it’s a very, very powerful thing for just the detection and investigative workflows. But there are far broader use cases as well.


Corey: The real value as well—I want to be very clear to help paint the picture here—you have a web server, or an application server, or database server, if you’re still running those yourself—given some of the database services that are offered, I can’t say I fault you for that particular choice, but I digress—if suddenly those things start talking externally to random botnet command-and-control servers, for example, that’s atypical behavior. And it’s the kind of thing that you sort of would like to know, approximately, immediately, it’s the sort of thing that emerges of, “This is an emergent aberrant behavior and it should be investigated.” Now, the other side of that is, I set this up back at the beginning of the year—thank you for the account, it’s appreciated—and I wound up getting it dialed in on my environment, and I haven’t logged into it in a few months. So, now I’ve logged back into it for this discussion, there are zero alerts waiting for me.


And that’s no small thing because what I do on this development EC2 instance in this account is monstrous. There’s no way around it. I install random stuff from Docker Hub, occasionally, due to poor life choices, effectively the entire software security supply chain—oh, [laugh] that’s a funny joke. I don’t know anyone who—involved in any aspect of it runs in my stack. I may as well just open it to the world.I have my IRC connection living persistently on this box through Irssi. It does a whole bunch of things and talks to other stuff because that’s the way the world works. It’s messy. When I set this up, it flagged those things immediately and I said, “Okay, don’t alarm on the fact that it’s connecting to Freenode with IRC.” Great. It hasn’t bothered me since as I continue to do monstrous things. There were no alerts waiting for me because the problem of not getting any alerts when things are going wrong is super bad, but getting alerts constantly when things are normal, is in many ways worse because when something happens, it gets masked.


Matt: A hundred percent. Yeah, so what you experienced is the power of the state-of-the-art of network analysis. And behind your instance is machine learning that runs in the cloud at scale. And what that means is, is that the system that you’re running in your environment, right now, Corey, is able to extract observed transactional features that feed the machine learning. And so initially, the IRC, we’re like, “Wow, we don’t normally see this, dude.”

And you’re like, “No, don’t worry about it, ExtraHops.” So, what we learned is, that is normal behavior in your environment. And there’s just a plethora of different use cases and different machine learning models and implementations. That stuff doesn’t really matter for the purposes of this conversation. Suffice it to say, when you think about the network, just if you’re looking at it through the pure lens of as a data source itself, well, what kind of data, what sort of information could I mined from that data source? Then the answer is it’s staggering.


So, then the question becomes, how do I present it—which you’ve mentioned earlier—with our UI? There’s been a ton of R&D, that we’ve got this wonderful R&D team. And the UX team has done a great job at distilling the information down that we surface because we’re just analyzing just insane amounts of raw network data in a given environment and every single day. So then, when you overlay machine learning, it really helps to sort of—you know, there are certain things that machines are really, really good at doing, and extracting features and analyzing those features for real behavioral analysis is one of them.

Corey: I also want to point out as well—because again, I approach the entire world through a lens of AWS billing, and there’s an awful lot of solutions out there that give horrifying impact to the AWS bill by deploying them, to the point where you start doing a cost-benefit analysis and realize, “Huh. I’m reasonably certain an actual data breach would be less expensive.” And you wouldn’t be far from wrong. I just pulled up last month’s bill in the account this is running in, and sure enough, the traffic mirroring, that is what powers your solution is a third of my bill. But I want to say that that third of the bill is $10.08.


And that does not have traffic volumes attached to it; it is strictly a per hour—one and a half cents per hour—that it’s attached. The end. And I’ve got a level with you, if $10 is meaningful to monitor what’s going on on the network in an account, I don’t know what to tell you, other than perhaps you are not the target customer. And I want to get into that a bit with you because I’ve long held the opinion that there are different on-roads for different companies at different times throughout their growth to start working with vendors. Who should be reaching out to you folks, and more importantly, at what stage of the development process does starting to engage a solution that looks at the network traffic and cares about network visibility makes sense in the modern era?


Matt: Very high-level guidances is this, is that if you have any Infrastructure as a Service running in your environment of consequence with risk associated critical assets, with critical services. Generally speaking, Corey, it’s worth reaching out to us about—whether it’s cloud, or enterprise, or hybrid combinations therein, if there’s a network to monitor, we will do that. And we don’t discriminate in that way. So, it’s very, very useful also, for the enterprise cloud journey folks out there, and there’s a lot of them [laugh] at various different stages at this. If it’s early stage, there’s the sort of assessment, the security controls that need to be sort of moved up into cloud.

And a lot of the executives that I talked to, I’ve got—I’m fortunate, I get to talk to CEOs and VPs about this exact scope of concerns, and many of them, their feet really aren’t firmly under them when it comes to cloud. They’ve got their enterprise environment locked in, and they’ve got their security controls well defined, but DevOps is moving and the agility that they’re gaining from the cloud, it’s moving so so fast that the CSOs are kind of caught flat-footed and they’re not exactly sure what this thing should look like in the cloud. And so, for the enterprise folks on the journey into cloud—digital transformation, whatever buzzword you want to throw at it—that’s another wonderful target account for us.

Corey: An observation slash analogy I’ve been making for a little while has been that, imagine tomorrow I go and I file the paperwork to start Twitter for Pets. I already own the dot com, but now it’s a real business. And in the next 10 years, it’s going to become an S&P 500 component where, great, it has gone from ridiculous social network for pets to consequential social network for pets. And as it grows from ridiculous startup to large enterprise, there has to be a reasonable onramp for folks, given the sensibilities of how companies work today.


It can’t be an enterprise transformation story because anything I start tomorrow is going to be born in the cloud anyway. And it’s no guarantee or honestly, not even that likely for a lot of these use cases, there will ever be a physical data center component. There has to be a point during that company’s growth where there’s a natural on-ramp to use a vendor’s product or service because if there isn’t one, they are fundamentally serving what is, in the very long term, a market that is in decline. And that’s always the sort of thing I look for and am cautious about. Oh, we wouldn’t be having this conversation if I thought you didn’t have an option for folks who are in precisely that position. How do you think about that?

Matt: Well, no, it's a really interesting point, you’ve got a very unique voice in the space. Before I continue, I really like the particular angle you’re approaching these problems from because these are conversations that have to take place. So, the operational concern itself bears a certain cost, and a certain level of risk, and a certain level of opportunity cost. And you’re exactly right, at some point in the story arc of a cloud—or business’s experience as they grow into this, there’s a point of diminishing returns with native tooling or hand-rolled tooling. And beyond a certain point of scale, you need to actually fall back on more broad-based utility, broader coverage of the security requirements, the coverage of your security policy and your controls, and just better alignment. And in many, many cases that will be vendor-led. And that’s okay. But you’re exactly right, there is a point beyond which you’re really going to want to engage with experts in that particular domain because it’s not cost-effective to do so yourself.


Corey: One of the most blatantly wrong things that I hear from the world of cloud marketing comes from AWS itself, which is, “There’s no compression algorithm for experience.” There absolutely is. You don’t have to build all of this stuff yourself from scratch. You can compress that experience into hiring experts who are good at that sort of thing, either as employees or consultants. That’s why advisory consultancy is a thing.


You can buy products and services that compress all of that hard-won, hard-fought experience into something that you can buy off the shelf and it solves the problem far more effectively than you’re ever going to be able to build in-house. And that’s a valuable and powerful thing. The hard part, of course, is in the security space, you can effectively spend infinite money on security, and even then there are no guarantees. So, it’s challenging as companies grow—especially in the early days—to make security a priority because it’s always something we’ll focus on later until suddenly, you really should have been paying attention, and now it’s too late.


Matt: Yeah, this is a big one. And I understand how that comes to pass, Corey, as do you and everyone who’s listening. Like, it’s very easy to rationalize yourself into that place, and it’s very understandable. And in fact, I myself have done it in my past in—as my prior life in operations. And there is a certain point beyond which the risk calculus alone and the impact of that, it just reverses the polarity of that whole discussion.

And then the worst case is something bad happens to you when you’ve been in limbo before you’ve implemented your security. Unfortunately, we’ve seen this happen with several organizations where they’ve decided to just freeze budgets on security, whatever, and then bang, there’s a compromise and they end up on the news. I’ve seen this several different times in the last year alone, as a matter of fact. And so this isn’t fear-mongering, and I want to—Corey, part of your brand is calling out things as you see them, and so I think that one of the unfortunate things about the security industry at large is there’s lots and lots of fear-mongering. And I’m not doing that.

Instead, I’m saying understand your risk and understand that calculus and your appetite for impact. Let that be your north star as to when to really get serious about your security controls. And that might be from inception, by the way. And that’s a great answer. To an earlier point, it might be a risk that you’re willing to make up until some sort of financial threshold, beyond which you’re not willing to appetite—it’s a unappetizing risk beyond that.


Corey: Forget dozens of visualization tools and view your entire system in one place with New Relic Explorer, the latest addition to New Relic One. See your system-wide health at a glance with a dense hex view that has your hosts, services, containers, and everything else. And get an estate-wide view of sudden changes, so you can catch issues before they impact customers. So go to https://newrelic.com, sign up for free, and start exploring your system today.


Corey: It really comes down to risk management. I mean, one of the reasons that I focus on the AWS bill is that that is almost ever a company-ending event, it’s, “Oh, I spent too much money,” is the cost of not focusing on it sooner. And that’s almost always both okay and survivable. In the absolute worst case of, “Wow, we normally have $1,000 a month bill and we just got charged $800,000,” AWS is a company that understands the longer-term view, you can reach out to them and get it fixed in almost every case. Security does not work that way.


And it’s much less tangible, as far as being able to sell something effectively into that market. In fact, one of the problems I have is walking around the RSA expo hall—whenever I was able to do that in the before times; last conference I went to before this whole thing started—and you see what feels—past a certain point—the same product being offered again, and again, and again, with different logos and different company names, but the messaging is the same, and it’s incomprehensible, and it just looks like there is no winning here. I found that ExtraHop was a breath of fresh air comparatively. But I’m not going to lead you that far down the road. Tell me what separates you folks out from the industry at large—not specific vendors because no one’s going to look great smacking in the competition, but there’s something refreshing about your approach and how you talk about your approach. Where did that come from?

Matt: It comes from our pedigree of being network-deployed, but application-fluent. So, here’s a fun fact. So, our co-founders, years ago, invented the modern-day application delivery controller, specifically at F5 networks. And this was a long time ago. And in so doing, that device is a very, very, it’s a network-deployed device that’s deeply application-fluent, and all of that domain experience and all of that sensibility towards scale, the ability to see inside decrypted packet streams and do analysis, all of that made its way into our product and then fed the beast of network analytics.


And our worldview really is steeped in this idea of just network analysis and the various outcomes that you can glean from said analysis, like behavioral detections for security, like asset inventory, your security controls, this the visibility that you cited earlier, Corey. It’s like many environments, they don’t know what’s running. And the network will tell you what’s running in a way that’s deeper than just, like, the management console listing the assets and services you’ve got. And so now, down to even the transactions, what types of services? What’s the consumption model of this?


Who’s consuming it? Where’s the traffic going? And is this normal? Yes or no? So, that’s really what makes us different. Most of the folks in our space focus solely on detections, and we believe that the network as a data source can give you much, much more value. And so we strive to deliver that.

Corey: There’s an awful lot of value in being able to deliver value upfront, and getting customers who have worked with you before to say, “Yes, this thing is amazing.” And I have problems with that in the space that I’m in because it turns out that there is a perception—that I disagree with—that fixing bills or talking to someone about a cloud bill that was high is somehow a ding on the company. And it’s not even about being high; it’s about having a lack of visibility or understanding in many cases, but people don’t want to talk about it. It’s hard enough to get testimonials and logo rights in that context. In a security space, it feels like we are thrilled to wind up buying your product now that we see the value of it. If you ever mention our name in any context ever again, we’re going to drive a wrecking ball through your corporate headquarters, legally speaking. How do you get past that?

Matt: It’s understandable, first of all, and you’re right, Corey. In large part, folks are not super eager to talk about security in a very public way. And that’s okay. I wish that there was more, though, not as a vendor representative where we would be the beneficiaries of it, but just more sharing in general really, really needs to happen. And what we’re seeing instead is the big disclosure and the big tech ta—like last year with SUNBURST.


It’s a monster and it’s catastrophic affliction leveled on the industry, and there was a single point of disclosure, which was wonderful, and then the sharing started. And I feel like there’s a lot more opportunity for information sharing, even with the current frameworks that are out there; there are vehicles to do this in a formal way for a given industry. But we need more. And you’re exactly right. It’s discussing the state-of-the-art and threats, and God forbid, attempts at compromise or full-fledged compromises, there needs to be more of that so we can collectively level up.


Corey: I’ll even name names on this because I’m not a security vendor. The Capital One breach a few years back was fascinating for me because it wasn’t just that they had done things badly or irresponsibly, didn’t read the instructions on the tin, it was a series of chained together exploits. There was a exploit in the web application firewall, I believe—according to court filings—that allowed someone to get a foothold. From there, there was an overbroad instance role that allowed them to get access to an S3 bucket that they should not have had access to from that account. It was tying together different things in different ways.


And that, in turn, is the sort of attack that is not easy to see coming, and there’s a lot of things you can learn from that; I’m sympathetic to it. The problem, of course, is that first, they’re are a bank and the lawsuits and the rest means that Capital One at that point, whenever the word ‘cloud’ comes up, felt like for a while they just put their heads down, and there was six more weeks of no talking about cloud whatsoever because they didn’t want to talk about it at all. But that’s the sort of thing where we can all learn so much from what happened. But the instinct is to button up and never say a word about it. Which means that the only people who are able to really go in-depth on this is, in fact, security vendors with the counter-argument that as soon as you start talking about that in your marketing, you get accused of effectively ambulance chasing or that you’re using fear, uncertainty, and doubt to wind up selling your products. And yeah, a lot of vendors do exactly that and it’s awful. But there are valuable learnings here, and it’s not just a sales opportunity for a product but rather an opportunity to uplift the entire ecosystem.


Matt: Yeah. And to the extent that the security market, in general, is a very vendor-wary market as an audience, and I understand why. I was on the receiving end of vendors as well, back in my prior life, as I mentioned. And I understand that, and to that, I would say, is make us prove it. If there’s a decision to be made and you’ve deemed it necessary to engage with us then, as a good security buyer, make us prove it.

And there’s many, many—especially in the cloud—there’s many vehicles at your disposal to test the claims of any given vendor with any given approach, whether it’s a SIM with log analysis, or endpoint, or network, or beyond. So, make us prove it, and then you’ll get a line of sight to whatever claims are being made around catching breaches, or understanding behaviors, or beyond.


Corey: So, with all that in mind, and obviously the way that things used to be and how all of this stuff would tie it together, it feels like the old answers aren’t right for the new era. So, from that perspective in a more forward-looking sense, what does strategic security tooling look like in this cloud era that we all find ourselves, willingly or not, enmeshed within?


Matt: Okay. That’s a super important—in fact, that’s probably like—you’ve asked a bunch of good questions; this one’s at the top of the list as far as I’m concerned. So—

Corey: When you don’t know a lot, you get very good at asking good questions because that’s how you fix that problem.

Matt: [laugh]. Hey, man, I ask a lot of questions myself, so you’re in good company. So, one of the problems in the traditional terrestrial enterprise is that their tooling strategy looks like a shotgun blast. And that shotgun blast is comprised of point solutions that are loosely federated at best, at best. And the only point of integration is the swivel chair that an analyst would sit in, or the Site Reliability Engineer, or DevOps person.


Corey: Don’t forget the screens upon screens upon screens that show amazing things when someone walks by, but if you think about this for more than half a second, you realize people are going to wind up with repetitive strain injuries from trying to pivot to look at all those things on the screen, and wow, maybe that much thing to look at all at the same time, but be incredibly stressful that unpleasant when you’re getting a suntan from the monitors. That’s a problem.

Matt: No, that’s exactly right. The big board of the past, in the terrestrial Data Center—the Security Operation Center or the Ops IT center, whatever, the ‘fishbowl’ we used to call it back in my old place—that really does point to the legacy era. Now, if you hoist that exact same model up into the cloud, or especially in hybrid environments because most—or many. I don’t know about ‘most,’ but many are in this sort of transitionary state. They’re multi-cloud, A, or they’re at some stage of cloud adoption with traditional enterprise workloads.


Well, now what does tooling look like because we have a management plane that can do really, really intelligent stuff, and the APIs are very, very consistent, they’re very actionable, and they happen pretty quickly. Not as quickly as I would like sometimes, but these events are easy to trap, and they’re easy to act on. And so the modern era of security tooling is comprised of, think about your data along the boundaries of its data source. So, for example, I care about my containers and so I want some sort of runtime container visibility. Or if I’m running EC2 instances, I want endpoint visibility because I want to know what’s running and resident in memory, or if it’s whatever; malware or whatever.


Then I want—I’m going to log because you log a lot in the cloud, it turns out, and so I’m going to need some way to make sense of those logs and wrap that into part of my practice. And then lastly, I want to have visibility into the network because of the three things that I just described, endpoint, say, or agent-based approaches, log-based approaches, those things can be evaded, they can be disabled, they can be turned off—and in fact we saw evidence of that, very active evidence, last year with SUNBURST—and the network is the only one that’s truly covert and difficult to evade, manipulate, or disable. And so as part of this collective strategy, now you’ve got—and we’re very complementary to one another: logs are complementary to us; where we leave off, as well as endpoint, and vice versa. And so we call this the ‘Cyber Triad.’ And this is not just our terminology; it’s analysts and others that are out there.


Corey: Always good when you hear the buzzwords, and they didn’t come directly from the vendor.


Matt: In this case, it’s not a buzzword; it’s actually a genuine strategy because we tended—in the past, we haven’t thought about our security tooling from a strategic, sort of, data source perspective. And in the context of cloud, especially, you can wield these data sources in some really, really powerful ways and do, in this sort of DevOps or SRE sense, you can do this event-driven security model. Now, the tooling itself can emit events into the management plane of the cloud, and the cloud, in turn, can take intelligent action. It’s a beautiful and devastatingly powerful new era for real-time security response. So, now in the past, Corey, I would quarantine a process on a system, or maybe if something was really, really bad in a terrestrial, I would just, like, disable that, block it.

Maybe I would do virtual patching on the firewall where I would disable a given service on the firewall. Well, now in the cloud era—and your audience understands this super well, I just call the management plane and redeploy the container. Done. Golden image; it’s fresh, it’s clean, it’s got attribution and I know that if that other one was compromised, I’m just going to get rid of it, because cloud, and redeploy this thing right in its place. It’s beautiful.

And so in the modern era, the cloud itself unlocks a set of operational models for security that are really difficult to achieve otherwise. It’s not impossible; there’s a whole industry dedicated to it, but in the cloud era, it’s much, much, much easier, and it’s easier to wrangle, and you can hoist it higher up into the dev lifecycle, the CI/CD lifecycle itself. So, it’s a really nice time for security ops.

Corey: It really seems to be. Matt, thank you for taking the time to go through, sometimes, the befuddling world of InfoSec, especially from a vendor perspective. If people want to learn more about you, what you’re doing, what you’re up to, where can they find you?


Matt: Well, they can find us at extrahop.com. And there we’ve got cloud case studies, use cases. In fact, we’ve even got an eval that’s out there. We’ve got a live—it’s running in the cloud, actually, a live demo where you can sign up and experience the system running in the cloud, before your very eyes and see the type of visibility gains you can get, and network analysis manifest, really. It’s a real live system up there. So, I would strongly recommend that if anyone’s interested, to have a look at that because it’s quite a powerful model, in my opinion.

Corey: And if folks have questions, do feel free to direct them my way because, remember, the one thing that is never for sale here is my authenticity, for better or worse, which often gets me into serious trouble. Matt, thanks for taking the time to chat with us. I really appreciate it.


Matt: Yeah, likewise, it’s been a pleasure, Corey. Thanks so much.


Corey: Matt Cauthorn, VP of Security and Cloud at ExtraHop. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment that you will later be able to disavow because no one was tracking what was happening on the network, so it must just be an application bug.


Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.


This has been a HumblePod production. Stay humble.
Play Episode