Screaming in the Cloud
aws-section-divider
Audio Icon
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss
Episode Summary

About LaurieLaurie has been a web developer for 25 years and cares deeply about making the web bigger and better for everyone. He previously co-founded awe.sm and npm, and is currently a Senior Data Analyst at Netlify.

Links:
TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by our friends at Fairwinds. Whether you’re new to Kubernetes or have some experience under your belt, and then definitely don’t want to deal with Kubernetes, there are some things you should simply never, ever do in Kubernetes. I would say, “run it at all.” They would argue with me, and that’s okay because we’re going to argue about that. Kendall Miller, president of Fairwinds, was one of the first hires at the company and has spent the last six years the dream of disrupting infrastructure a reality while keeping his finger on the pulse of changing demands in the market, and valuable partnership opportunities. He joins senior site reliability engineer Stevie Caldwell, who supports a growing platform of microservices running on Kubernetes in AWS. I’m joining them as we all discuss what Dev and Ops teams should not do in Kubernetes if they want to get the most out of the leading container orchestrator by volume and complexity. We’re going to speak anecdotally of some Kubernetes failures and how to avoid them, and they’re going to verbally punch me in the face. Sign up now at fairwinds.com/never. That’s fairwinds.com/never.


Corey: The apps on cloud summit is a new action packed, not a conference, happening May 11th through 13th online. Its for everyone who makes applications in the cloud run screaming. From IT leaders to DevOps pros to you folks, whoever you might be. Take a break from screaming into the cloudy void with me to learn from some of the best of people who actually know what they’re doing. Like Kelsey Hightower, AWS blogger John Meyer, and also me, because apparently they didn’t listen to me saying I had no idea what I was doing. Register now at turbonomic.com/screaming. Theres a “swag box” ready to ship for the first two thousand registrants, so you don’t want to miss this. Thanks for Turbonomic for sponsoring this ridiculous podcast.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Laurie Voss, who is currently a senior data analyst at a company called Netlify. Laurie, thank you for joining me.


Laurie: Thanks for inviting me.


Corey: So, let’s start at the very beginning. What is Netlify?


Laurie: Netlify is a single cohesive build chain for websites. A lot of people don’t think of it that way. I think a lot of people think of Netlify as a web host, but really where people are getting value from Netlify is you build your website, you upload your website, you deploy your website, you host your website, you test your website, you monitor your website, and that can be five or six different services, like a CI service, and a hosting service, and a Git service and all of those things. And Netlify just joins that entire build chain into a single tool where you just hook up a Git repo, hit commit, and it goes out into the world, and it’s incredibly fast and convenient. And that’s really where people get value out of it.

Corey: Perhaps somewhat uncharitably, I would almost think of that as Heroku for this decade.


Laurie: I mean, I would consider that pretty charitable to us and somewhat uncharitable to Heroku, who are still around and chugging.


Corey: Oh, absolutely. I’m a big fan of things like that, where it’s take this code—whatever it looks like, maybe it’s a repository, maybe it’s some, I don’t know, some files I email over, God forbid—and then go ahead and deploy it into something that at least pretends to be able to scale. I often hear Netlify brought up in the context of Jamstack, which seems to be this whole area of cloud computing that I don’t tend to spend a whole lot of time in, at least not knowingly. What is it?


Laurie: So, Jamstack originally stood for JavaScript, APIs, and markup sometimes also referred to—


Corey: But I hate all of those things. Please continue.


Laurie: [laugh] it’s sometimes also referred to as static websites, which is a term I tend to avoid simply because it’s not really very accurate. A static website is one of the things that you can deploy on the Jamstack, certainly, but it’s certainly not the only thing you can deploy. I would say that it is an architecture that lends itself to pre-rendering as much content as is possible, and then caching all of that stuff at the edge, and then pulling in only the bare minimum of dynamic content to improve both scalability and performance. Those are the things that people like about Jamstack websites, is that they tend to be extremely fast.


Corey: So, that makes intuitive sense to me. And you, of course, became fairly broadly known as one of the people behind npm. But now you’re a senior data analyst, which feels like it’s a departure from the things you were doing to the things you’re doing now. Help me either validate that, or tell me what obvious thing I’m missing, or highlight something clever for me because right now, I feel like there’s a missing link in my chain of events here.

Laurie: No, that’s a totally fair question. So, I started npm as the CTO and hired an excellent engineering team underneath me. In fact, one of our very first hires was a lady called C J Silverio, who is just a staggeringly good engineer. And it became very obvious very early on in the life of the company that we really had two people of CTO caliber, and that we didn’t need to have them, but what we did need was somebody to run the operational side of the business. So, relatively early on in the life of the company, we promoted C J to CTO, and I moved my title to COO, you know, obviously, still with a technical bent, but my job as a COO is to do operational things.


So, I was in charge of running the financials and making sure that marketing and sales weren’t going massively over budget or under quota, those sorts of things. And that’s fundamentally a keep-the-lights-on data analysis job. So, while I was CTO, I was sharing fun stats about npm’s internals; while I was COO, I was doing a lot of analysis of our financials. But the common factor was analysis, and I was doing more and more of it. So, towards the end of my time at npm, I became the Chief Data Officer, where I basically specialized down into doing just data things—some financial, some technical—and doing a lot of outward-facing presentations about that kind of thing.


So, that was where my job ended up being. And literally how I pitched my way into Netlify was like, “What if I did that thing that I was doing for npm for you,” and they were like, “Great. You can’t be a C though because you just got here.” [laugh]. I was like, “Fine.”


Corey: Well, of course. We all have to start somewhere. Humility. And it took me a couple of years to unofficially run AWS marketing. My God. Yeah, have some humility as you step through this process. Was it a big barrier to you once you arrived at Netlify, convincing them to buy you the Excel license you obviously need to do all this data analysis, or alternately, are there better tools for it, then the one that we’ve all been using anyway?

Laurie: Honestly, I’ve always been a Google Sheets partisan. I know that the really hardcore financial types will complain about the functions that are missing from Google Sheets versus Excel—


Corey: Oh, will they ever.


Laurie: —but I’m not that person. But we have a pretty great stack that I like quite a lot at Netlify these days. We have a variety of older tools laying around, not all of which we’ve migrated away from, but the core of the new class is this company called Databricks, who are basically Spark clusters as a service. So, you can just throw, essentially, arbitrarily large amounts of log data on to S3 buckets on AWS, and it can query them as if they were databases, which is truly beautiful. And on top of them, we have a system called Mode Analytics, which is a general platform for data analysis, and presentation; draws graphs, that kind of thing; has an SQL interface.


And between those two we’ve got a new open-source project, or relatively new to me anyway, called dbt, which is this very organized, clever way of codifying your best practices around data. So, you’ve probably heard of extract, transform, load jobs; it’s basically a way of quantifying chains of extract, transform, and load jobs such that they’re always tested, and always running, and you know what the dependencies are between them and everything is documented.


Corey: Okay. While I’m in the process of getting everyone in trouble on things, what is your take on machine learning for things like this? Because it seems that whenever you talk about data, it’s inevitable that someone, usually with a crap ton of VC backing, will immediately jump in because they’re clearly getting bonused every time they managed to fit the phrase ‘machine learning’ into basically anything.


Laurie: So, I would step back a bit and say that, before I joined Netlify, I interviewed at a couple other companies just to see what the space was like, for basically the same job at other companies. And there was a really interesting pattern that I noticed, which is that it is quite a common pattern for an early-stage startup, to say, “Oh, we have a data problem. We must hire a data scientist.” And they go and find somebody staggeringly qualified, with a PhD in data science, and they hire that person. And that person immediately runs into trouble because that is not actually the problem that they have.


They don’t have a data science problem; they have a data engineering problem. They have, like, mounds of data lying everywhere, and it’s not organized, nobody knows where it is, nobody can query it efficiently. A data scientist is, at earliest, your fifth hire in your data team. The first five people are people who have to do an enormous amount of plumbing and engineering to be able to just get the data from all of the places that it’s lying around, all of the piles that it’s accumulating in, into any kind of a reasonable format that you can query it and figure out what it does.


Corey: You have to forgive my cynicism on some level because I’ve been in the ops space for, I guess, entirely too long where I’ve been dealing—particularly in the context of AWS bills, with making arguments against data science teams who are insisting that the Apache logs from 2012 that are taking petabytes of space are the key to unlocking the mysteries of the business. They’re not sure how yet, but one day they’re going to become super valuable, so I’m never allowed to delete anything. And on some level, it just almost seems like it’s a big make-work conspiracy for data scientists amongst each other which, hey, respect. Counter-argument; what sorts of insights can you glean from these vast quantities of data because everyone else I’ve talked to about this generally works for a big-data-oriented company. I got to be honest with you, it feels like they’re selling pickaxes into a gold rush because, “Oh, it’s very important to keep all your data so that we can sell you things to go through it.” You’re on the other side of that your buy-side. So, what is the value that this giant data hoard winds up providing?

Laurie: Well, I will say that my initial inclination is to agree with you. There’s definitely a lot of pickaxes being sold to miners who have no idea what they’re doing. I think about ten years ago, there was a huge industry-wide pile-into big data people were like, you need Hadoop, and you need gigantic data processing clusters, and huge data, and massive amounts of processing, and, like, buy this enterprise contract for $100,000 a year. And then everybody did those things and was like, “And now what?” And they were like, “Oh, well, we don’t know. Maybe you can count it up. How many hits did you get?”


That’s not useful analysis. Having all of your data queryable is not, per se, a useful thing to be able to do. And I think in the 10 years since then, people have got smarter about that. They realized medium and small data are actually [laugh] often quite useful. It’s more about how you analyze it, and can you present it to people, and can you make sense of it?


But there was a second gold rush into the ML space. There are certainly use cases where you have enough data and a problem that is amenable to being solved by applying ML to it in some way. Those are a minority of cases; they’re maybe five percent of all data problems are big enough that you can use ML in the first place, and also get an answer that ML can help you with, would be helpful. And the other ninety-five percent, it’s just plumbing and engineering.


Corey: Once upon a time, it felt like the way to address all this data was the… honestly, the result of a prank perpetuated many moons ago by what felt like Google in a white paper, that Yahoo went for hook, line, and sinker for MapReduce, which then led to Hadoop and a bunch of other stuff. I maintain this was a Google April Fool’s prank that everyone took way too seriously and went way too far. These days, it feels like stream processing as that data comes in is sort of the preferred approach. Yes, no, or am I completely misunderstanding most of the point? Or all the above?


Laurie: I would say definitely, the industry has moved away from the batch processing that Hadoop did. I actually worked at Yahoo at the time when they were inventing Hadoop. [laugh].


Corey: Oh, you fell for it, too. Great.


Laurie: [laugh]. I was—we were selling the Kool Aid as opposed to drinking it.


Corey: Oh, if you’re going to be involved in a Kool Aid transaction, that is absolutely the side of it you want to be on. Let’s be very clear here.


Laurie: So yeah, streaming processing, but like semi-real-time processing of things, as opposed to giant batch jobs is certainly where stuff has mostly gone. Although people who are end consumers of data, as an analyst, if I asked you how fresh does this data need to be, they will always say realtime. Like, [laugh] that will be their first answer. And then I’ll be like, “What if it was 24 hours delayed?” And they’re like, “Oh, yeah. Well, obviously, yesterday’s data is fine. I’m not going to care about what happened at noon today when it’s 2 p.m.” And then you’re like, “Well, yes. Well, then it’s a batch job, and it’s, like, an order of magnitude cheaper to provide to you, so let’s do that.” Batch jobs are still very cost efficient and so we do a lot of batch processing, it’s just we don’t make a big song and dance about it anymore because it’s no longer the new shiny thing.


Corey: On some level, it feels like that is the nature of things where something gets announced, and it’s super complicated and hard, and people skill to the peaks of complexity, and they make good money doing it. I mean, in the original dotcom boom, ‘firewall engineer’ was a quarter million dollars a year if you could swing it. Now, it’s just assumed that basically, anyone who touches the network should be able to configure firewall rules; things get simpler with time. It feels, on some level, like an awful lot of the data world is undergoing some of that consolidation as well, where we’re starting to find tools and methods and ways to extract meaning from giant piles of data without the part where, you know, you go and drop $5 million here on a data science team.


Laurie: Well, you’ve sort of arrived at my favorite pet topic, which is the stack. The stack is this abstraction that I wrote about at the beginning of last year. It’s the idea that the ever increasing complexity of technical fields means that we are constantly inventing, adopting, and then forgetting about abstractions. As you said, we’re constantly chasing after the new shiny thing; we make a big song and dance about it; it’s very complicated. People make enormous amounts of money doing it in the early days, and then somebody eventually invents some kind of tool or open-source framework, or possibly, like, a SaaS that makes it one-click to do.


And it’s not any less complicated or any less magical than it was before, it’s just you think about it much less, right? Like I mentioned, Databricks. Every time I run a query Databricks is taking my SQL, converting my SQL into giant MapReduces, running it on a huge cluster of machines of arbitrary size alu—I don’t know what size it is because I don’t need to care anymore—and then pointing it at AWS, where it’s pulling in every single piece of data in every bucket that I put in there. And all of that, ten years ago would have been of a complexity that only Google or Yahoo could do it. And now it’s literally we spin them up by clicking a button and we don’t even remember that it’s happening.

Like, all of that complexity is still happening, all of that magic is still happening, but now it’s just a commodity. And we’re doing that across the tech space. So, we’ve certainly done it in data; a bunch of stuff that used to be very complicated, used to be the thing that you would hire me to do is now just the tool that I use and the thing that I do is the analysis, which is a more useful use of someone’s time, really.

Corey: One of like to hope so. But I do feel like there’s a story—and we see it across the board; this is one of the things I really enjoy about Netlify—once upon a time to put a website on the internet, you had to know a whole bunch of different things all at the same time. It was, how to build a web server, how to maintain and patch that web server so it didn’t become an attack spam cannon, how to get files into a format the web server could understand, how to put that out there, how to get DNS to work, how to handle SSL—if that was even a glimmer in your eye at that point—and so on and so forth. Now, it really requires, click a button. And Netlify is made this way easier because I tend to look at this from the exact opposite side in the industry where I come from an ops background; building all the infrastructure to handle these things is relatively straightforward to me, but then I get to the other side.


Cool, now all that’s done, “Build the web app.” And my response, “Ehhh, what?” Yeah, I can write bad HTML by hand, sort of, and that’s as far as I generally tend to go, whereas it feels like the Jamstack story in general, and Netlify in particular, are aimed at folks in many ways, coming from the other side of the world where it’s, “I picked up JavaScript. I picked up a framework or two. I understand frontend, I understand how web applications get built. What’s the deal with this whole infrastructure piece?” And thanks to the miracle of stacks collapsing in upon themselves in many respects, you don’t have to know about that or care, and you live in this blissful world where the term Kubernetes never crosses your desk. Is that a fair summation of the state of the industry? Am I dramatically misunderstanding what Netlify does and for whom?

Laurie: No, I think that’s pretty much how it goes. One of the reasons that I wrote this blog post about the stack—it was almost exactly a year ago—is because about a year ago is when I joined Netlify and I was suddenly immersed in the things that Netlify does. It became more clear to me that I was seeing a fundamental shift happening.


I was like, “Oh. We are obeying some kind of natural law here, right? We are taking things that used to be people’s whole jobs and turning them into things that are so simple that you don’t even think about them happening anymore.” I’ve definitely met and worked with people in my life whose whole job was managing SSL certificates. And now, it’s literally a checkbox. And it’s on by default. It’s like, “Would you like your site to be secured by SSL?” Yes, obviously. I don’t know why I would turn that off.


And it just comes as part of deploying your website. Way in the background, let’s encrypt is doing it, and there’s a whole bunch of song and dance about refreshing certs every 90 days, and it all just happens completely automatically without you caring even a little bit. And that’s what Netlify is doing. It’s taking things that used to be five or six companies and squishing them down into a single layer that you call your deploy service. And you’re like, “Great. My deploy service does all of those things and I don’t need those other five companies anymore.”

Corey: Now, if you’re one of those five companies, that becomes something of a problem. But again, that’s the pace of innovation. That is the world continuing to evolve.


Laurie: Nobody wants to be commoditized, but on the other hand, the company that gets to do the commoditizing tends to run away with it, right? Like that’s kind of the AWS story. It’s like, there used to be lots and lots of companies that would sell you a server in a rack and then take 24 hours to set it up and you’d pay with a credit card. And AWS was like, “What if that was one button?” And everyone was like, “Yes, I would love that to be one button. I never want to care about what rack it’s in anymore, or whether or not it has enough power, or whether or not the cable in the back has got jiggly. Just virtualize it all the way for me, thank you.” And then AWS completely ran away with it.

Corey: Oh, yes. And it’s AWS, so it was, “What if that button was hidden in a console that doesn’t work super well, and then we give that button a terrible name?” People are like, “Ehh, I’ll risk it.”


Laurie: I mean, the observed behavior of the industry is that we love the terrible console.

Corey: Oh, absolutely. Everyone talks about infrastructure as code, which is basically a polite way of saying I use the console, and then lie about it on conference talks.


Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.

Laurie: [laugh]. Indeed.


Corey: So, since you brought up AWS, terrific, it’s time for me to do my whole conspiracy theory approach here and accuse you of basically war crimes. So, you were big into the npm space for a long time, which is great. I accept the fact that that is a thing that happens—package.json and package-lock.json are basically artifacts of you folks.


Now, AWS has launched their Amazon CodeGuru machine learning—wink, wink, nudge, nudge—powered code review. And of course because it’s AWS, they charge based upon lines of code in a pull request, which tells me that you’re a deep plant for many years now, planning for the day where this one day supports JavaScript—which it doesn’t today—and all someone has to do is check in the package-lock and the package.json files once, and suddenly the entire scheme pays off handsomely. True, false, or I’m not supposed to talk about that in public?

Laurie: It’s true. I’m part of a global cabal whose purpose is to make Node modules infinitely deep until the gravity well sucks in all of programming and we don’t have computers anymore.


Corey: On a slightly more serious note, I do want to talk a little bit about package management—in the context of programming languages as opposed to package management in the context of Linux distributions because, oh, do I have thoughts on that—there are a few different competing tools out there to handle dependencies across different programming languages, in the JavaScript world, in the Python world. And I’m not a JavaScript programmer, except when forced to be, and it’s usually editing something as small-scale as humanly possible and backing away slowly. But my general consensus, looking at it across the board, is that there is no consensus, that there is no clear one right way to do things. Invariably, dependencies always become a challenge. Getting something to a reproducible build while also being secure is a problem.

And no matter what stack you pick, what language you pick, there’s always a—for ‘Hello World’—there’s a step one of setting up your local environment to resemble what the person writing the document’s environment looks like. Is that accurate? Is there some magic tool out there that somehow I’m just unaware of that solves all of this for me?


Laurie: Well, there’s definitely not a single tool that gets it completely right, but I would say that there is a commonality between the things that work that I don’t know that everyone appreciates. So, I’m going to draw a parallel between package.json and Kubernetes right now, so bear with me. Basically the thing that people often don’t like about npm and the thing that people don’t like about package.json is that it says, “All of your dependencies must live here, in your tree. I don’t care how many JavaScript projects are on your computer; I am going to have one copy of every module right here where I can see it, and I’m going to use those and only those.”


It tends to make JavaScript programs a little bit easier to debug because you know that the code that is at fault can’t possibly be anywhere else. It can’t be sitting in userlib unexpectedly, or in some additional libraries folder, or it can’t have been, like, blown away by somebody installing something else. It has to be the one that’s sitting in your tree, and that’s one of the things that made Node so popular in the beginning, and npm so popular at the same time, was that it was very easy to deal with, and in particular, it made it work on Windows, which didn’t have any of those things anyway. And Node's popularity as a development environment, where you could write code on Windows and it would work perfectly in a Linux environment because all of the dependencies were JavaScript and that ran the same on both of those computers is understated. And that’s essentially the Kubernetes story.


Kubernetes is saying, “This thing where we have libraries all over the place, where we have dependencies all over the place, like, they lie all over the operating system. It’s too late to fix that. What if we packaged up the entire operating system and said that that’s the package?” And that’s what Kubernetes is. It’s creating a package.json of your entire computer, and then you run that.


Corey: It sure beats the old approach of, “Oh, it works on your machine. Great. Well, backup your email, Slappy, because your laptop going to production.”


Laurie: Exactly. Right. It’s basically, you’ve packaged up the entire world. And people are like, “Well, this is very wasteful.” And we’re like, “Yes, it’s very wasteful. But it works.” And like the other approach—


Corey: You know what’s less wasteful, then? That’s right, a whole bunch of engineering time spent fixing things. “Well, that’s not the most optimal way of doing it,” say people who seem to consistently mistake their time for being free.


Laurie: Exactly.


Corey: No, and it makes perfect sense. I love the fact that I can use at least some semblance of what other people are using and get it to work. The counter-argument to it is that it’s very—how do I put this—disconcerting when I’m working in a Python project, but I’m using a framework or so that generally installs via npm, and now my Python project has a package.json in there, and I get very confused at first. And, all right, then I run npm install in there and then I’m way more confused. And I mostly just look at this, and I struggled to make sense of it before the penny drops. “Oh, that’s right. It’s because I’m bad at computers.” I wish people would not keep letting me forget that part.

Laurie: [laugh]. Is your objection that you can’t launch a website these days without JavaScript anymore because a lot of people are angry about that, and they send me email more often than you would imagine.


Corey: Well, I assume it’s your personal fault, right?


Laurie: I mean, absolutely. Like, again, the secret cabal; we’re trying to inflate all of your applications with as much extraneous code, with as many security vulnerabilities as we can possibly manage because I work for the people who sell storage and virus scanning, obviously.

Corey: Emailing you about the world requiring JavaScript is evocative of an old story where some town manager angrily emailed the CentOS project maintainers because someone installed a web server in his environment and he pulled it up, and this isn’t our town’s website; it’s the default, “Welcome to CentOS. If you’re seeing this page, you’ve successfully installed Apache. Read these docs to configure it…” and accused them of hacking his website. It seems roughly the same level of technical nuance, blaming you for the proliferation of something in society.


Laurie: I don’t know. I mean, I certainly spent five years cheerleading it, so I feel like people who are, like, “You helped make this popular.” I’m like, “Oh, why thank you. I’m so glad you think I made a difference.” But really, it probably would have happened on its own. Like, I was running after a snowball that was already running very quickly downhill and engulfing villages as it went.

Corey: Absolutely. And I do want to talk to you about that in particular because as people on this podcast often hear, I talk about this podcast, I talk about the AWS Morning Brief my other podcast, and I talk about lastweekinaws.com where my newsletter lives; I don’t urge people to follow me on Twitter, I don’t talk about the Facebook page I don’t have. And the reason behind all of those things, is that I have built an audience on open standards and open platforms so that no one company can change business models and suddenly I have a serious problem.

It’s why I blog on my own website, not on Medium. Their business model changes aren’t going to directly impact what I do and how I do it. Do you think this is naive? Do you think that the open web was a nice idea and now we’re just going to see increasingly walled gardens as time goes on?


Laurie: I think the openness of your website is—or your web app, or your, sort of, technical strategy in general—is always going to be a hybrid; like AWS is… it’s not rolling your own, you’re using a service. If AWS decides that they don’t support your service anymore—which they never do as far as I can tell, but theoretically, they could—you would have to stop doing that; you are to some extent locked into AWS. But I don’t think that a website hosted on AWS is, like, not part of the open web.


Corey: I would agree wholeheartedly on that point, absolutely.

Laurie: Right. I think at that point, you’ve adopted a tool that works for you, and you can move elsewhere. So, there are people who say using JavaScript frameworks, that’s not the open web, you should have been writing your own; you’re dependent on Facebook continuing to maintain React. And I’m like, “Well, kind of, but not really. You don’t have to be. You could write your own website, if you wanted to. This way, it’s just faster, in the same way that hosting it on AWS is faster than spinning up your own machines.”


Corey: Oh, I take it a step further beyond that, I paid WP engine which, they manage WordPress for me, so I don’t have to, and the reason for that is I’ve managed WordPress in the past, and I will not go down that path again for love or money.


Laurie: [laugh]. Right.


Corey: But then, as a fun artifact of that, lastweekinaws.com does in fact live on GCP.


Laurie: [laugh]. Nice.


Corey: But it’s WordPress. Worst case, WP Engine shuts down, or charges me at times more, or decides that now, nope, everything has to move to a new framework, I can migrate it elsewhere. And the fact that I have that strategic exodus means that I don’t need to sit here on everything I build and agonize over, do I go all-in on my current hosting provider or not? It’s something that I can migrate with me. And I try and maintain at least that theoretical exodus path.


I can repoint domains to other places; I own the domains myself, and that has been enough for the way that I view the world. But increasingly, I’m starting to feel like a relic. Oh, follow me on Instagram; follow me on TikTok and it’s if these platforms pull a MySpace and vanish, then you’ve got to rebuild your audience from scratch, whereas email’s been with us longer than I’ve been alive, and it’ll be here long after I’m dead. I can carry that audience with me regardless of what any particular provider has. I just wish I didn’t feel like such a Captain Edgecase, or someone stuck in the past whenever I articulate that to some folks.

Laurie: Well, I’ve been in the industry a long time, so I think if you’re going to, sort of, say, “I’ve got this old opinion,” I’m going to be like, “Me, too. I’m also extremely old.”


Corey: And then we’ll talk about the Great War. “Wasn’t it amazing?” And, yeah, there we are.


Laurie: The browser wars of ’97, and I was ‘16.


Corey: Yes, we’ll make Eternal September references all week.


Laurie: Oh, my God. See, we’re literally doing that thing that I was just joking we were going to do.


Corey: We absolutely are.


Laurie: Yeah, I think you have to pick your battles. I think the one that I personally struggle most with is databases. I spent a good chunk of my career as a DBA; I definitely know how to install and configure databases. I don’t want to. [laugh]. You know, like, using one of the fancy databases as services, where you’re just like, it has an SQL interface and it’s got apparently infinite storage and infinite processor, and I don’t need to worry about it anymore.

Corey: Exactly, and it has those things because what it also has is someone else’s credit card. Done.


Laurie: Right. It’s great. But to some extent, I’m definitely locking myself into that database service, right? To some extent, I have to find an equally capable service if I ever wanted to migrate away. So, am I still open, or am I locked in then?


I don’t think anybody can call themselves truly independent, anybody can call themselves truly open. So, from your perspective of, like, what platform am I on, as long as you’re not only on that platform, as long as it’s not your only bet, I think—sure, pile into the Facebook page. Why not?

Corey: Yeah. I have separate problems with that that we need not get into here.


Laurie: [laugh].


Corey: That’ll be a whole separate episode there. So, as to look across the past—I don’t know, let’s call it eight decades that you and I have been in tech together, what are the themes you’ve seen continue to emerge that people should be paying attention to moving forward?

Laurie: I think one of the most common mistakes that I see in technologists who’ve been in the industry a long time, is—I can tell that they’re doing it because they start ranting about ‘the fundamentals.’ And it is my firmly held conviction—and no one will sway me from it—there is no such thing as the fundamentals. Everybody comes into the industry at a certain time, when a certain set of tools were considered commodities that you don’t need to think about, a certain set of tools were considered, like, the complicated thing that you need to learn, and a certain set of tools were considered, like, fluff on top that are bonus, but those things are always drifting downwards, right? Yesterday’s fluff is today’s bedrock, and the new fluff is stuff that wasn’t invented before. And then they start going, “Well, you should be able to understand HTTP, and roll your own JavaScript framework because those are the fundamentals.”

And I’m like, “Only to you because you came into the industry when that was the complicated thing. The fundamentals to somebody who started 20 years before you did are like, ‘you need to know about power management and how to configure a firewall,’” like you were saying, in the beginning of this thing. Everybody’s fundamentals are somebody else’s fluff.

Corey: Oh, you want to learn how Linux works? Step one—I see this in classes all the time—learn how Vim works.

Laurie: Right, exactly.


Corey: How about not doing that?


Laurie: Oh, my God—


Corey: —and focusing on the differentiated part. My God.

Laurie: The bizarre cargo-culting of Vim. I’m like, “You know why the people who are good at Vim are good at Vim? It’s because they’ve been doing Vim for 30 years. If you do any tool for 30 years, you’re going to be really good at it.”


Corey: So, you say that, but then you look at me with databases, and I don’t know, I might be able to fool you on that one.


Laurie: [laugh]. Use any tool for 30 years, and you’ll be so good at it that the switching cost is too high to go to anything else. But if you’re just starting in the industry, you could start with any editor that you wanted and it would be fine. And by the time you’ve been using it for 30 years, you’ll be like a goddamn wizard at it.


Corey: Mm-hm. Absolutely.

Laurie: So, that’s what I tell people is, like, the things that you learn now, you’re going to have to expect that they get commoditized. The stack that you live on today will get crushed down to nothing and you have to be constantly climbing the stack to what the new thing is.


Corey: [laugh]. I want to thank you for taking so much time to speak with me today. If people want to hear more about what you have to say and how you wish to say it, okay can they find you?


Laurie: I’m most active and responsive on Twitter. My username is @seldo and I also own seldo.com where I blog much less frequently than I would like to.


Corey: And we will, of course, put links to both of those into the [show notes 00:32:33]. Thank you so much for taking the time to speak with me. I really appreciate it.


Laurie: Thanks for the invitation. It’s been a lot of fun.


Corey: Really has. Laurie Voss, senior data analyst at Netlify. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and an entirely insulting, rambling comment complaining about how I talked about all these different package management systems for different languages and never once mentioned Rust.

Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.

This has been a HumblePod production. Stay humble.
Episode Show Notes and Transcript
About LaurieLaurie has been a web developer for 25 years and cares deeply about making the web bigger and better for everyone. He previously co-founded awe.sm and npm, and is currently a Senior Data Analyst at Netlify.

Links:
TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by our friends at Fairwinds. Whether you’re new to Kubernetes or have some experience under your belt, and then definitely don’t want to deal with Kubernetes, there are some things you should simply never, ever do in Kubernetes. I would say, “run it at all.” They would argue with me, and that’s okay because we’re going to argue about that. Kendall Miller, president of Fairwinds, was one of the first hires at the company and has spent the last six years the dream of disrupting infrastructure a reality while keeping his finger on the pulse of changing demands in the market, and valuable partnership opportunities. He joins senior site reliability engineer Stevie Caldwell, who supports a growing platform of microservices running on Kubernetes in AWS. I’m joining them as we all discuss what Dev and Ops teams should not do in Kubernetes if they want to get the most out of the leading container orchestrator by volume and complexity. We’re going to speak anecdotally of some Kubernetes failures and how to avoid them, and they’re going to verbally punch me in the face. Sign up now at fairwinds.com/never. That’s fairwinds.com/never.


Corey: The apps on cloud summit is a new action packed, not a conference, happening May 11th through 13th online. Its for everyone who makes applications in the cloud run screaming. From IT leaders to DevOps pros to you folks, whoever you might be. Take a break from screaming into the cloudy void with me to learn from some of the best of people who actually know what they’re doing. Like Kelsey Hightower, AWS blogger John Meyer, and also me, because apparently they didn’t listen to me saying I had no idea what I was doing. Register now at turbonomic.com/screaming. Theres a “swag box” ready to ship for the first two thousand registrants, so you don’t want to miss this. Thanks for Turbonomic for sponsoring this ridiculous podcast.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Laurie Voss, who is currently a senior data analyst at a company called Netlify. Laurie, thank you for joining me.


Laurie: Thanks for inviting me.


Corey: So, let’s start at the very beginning. What is Netlify?


Laurie: Netlify is a single cohesive build chain for websites. A lot of people don’t think of it that way. I think a lot of people think of Netlify as a web host, but really where people are getting value from Netlify is you build your website, you upload your website, you deploy your website, you host your website, you test your website, you monitor your website, and that can be five or six different services, like a CI service, and a hosting service, and a Git service and all of those things. And Netlify just joins that entire build chain into a single tool where you just hook up a Git repo, hit commit, and it goes out into the world, and it’s incredibly fast and convenient. And that’s really where people get value out of it.

Corey: Perhaps somewhat uncharitably, I would almost think of that as Heroku for this decade.


Laurie: I mean, I would consider that pretty charitable to us and somewhat uncharitable to Heroku, who are still around and chugging.


Corey: Oh, absolutely. I’m a big fan of things like that, where it’s take this code—whatever it looks like, maybe it’s a repository, maybe it’s some, I don’t know, some files I email over, God forbid—and then go ahead and deploy it into something that at least pretends to be able to scale. I often hear Netlify brought up in the context of Jamstack, which seems to be this whole area of cloud computing that I don’t tend to spend a whole lot of time in, at least not knowingly. What is it?


Laurie: So, Jamstack originally stood for JavaScript, APIs, and markup sometimes also referred to—


Corey: But I hate all of those things. Please continue.


Laurie: [laugh] it’s sometimes also referred to as static websites, which is a term I tend to avoid simply because it’s not really very accurate. A static website is one of the things that you can deploy on the Jamstack, certainly, but it’s certainly not the only thing you can deploy. I would say that it is an architecture that lends itself to pre-rendering as much content as is possible, and then caching all of that stuff at the edge, and then pulling in only the bare minimum of dynamic content to improve both scalability and performance. Those are the things that people like about Jamstack websites, is that they tend to be extremely fast.


Corey: So, that makes intuitive sense to me. And you, of course, became fairly broadly known as one of the people behind npm. But now you’re a senior data analyst, which feels like it’s a departure from the things you were doing to the things you’re doing now. Help me either validate that, or tell me what obvious thing I’m missing, or highlight something clever for me because right now, I feel like there’s a missing link in my chain of events here.

Laurie: No, that’s a totally fair question. So, I started npm as the CTO and hired an excellent engineering team underneath me. In fact, one of our very first hires was a lady called C J Silverio, who is just a staggeringly good engineer. And it became very obvious very early on in the life of the company that we really had two people of CTO caliber, and that we didn’t need to have them, but what we did need was somebody to run the operational side of the business. So, relatively early on in the life of the company, we promoted C J to CTO, and I moved my title to COO, you know, obviously, still with a technical bent, but my job as a COO is to do operational things.


So, I was in charge of running the financials and making sure that marketing and sales weren’t going massively over budget or under quota, those sorts of things. And that’s fundamentally a keep-the-lights-on data analysis job. So, while I was CTO, I was sharing fun stats about npm’s internals; while I was COO, I was doing a lot of analysis of our financials. But the common factor was analysis, and I was doing more and more of it. So, towards the end of my time at npm, I became the Chief Data Officer, where I basically specialized down into doing just data things—some financial, some technical—and doing a lot of outward-facing presentations about that kind of thing.


So, that was where my job ended up being. And literally how I pitched my way into Netlify was like, “What if I did that thing that I was doing for npm for you,” and they were like, “Great. You can’t be a C though because you just got here.” [laugh]. I was like, “Fine.”


Corey: Well, of course. We all have to start somewhere. Humility. And it took me a couple of years to unofficially run AWS marketing. My God. Yeah, have some humility as you step through this process. Was it a big barrier to you once you arrived at Netlify, convincing them to buy you the Excel license you obviously need to do all this data analysis, or alternately, are there better tools for it, then the one that we’ve all been using anyway?

Laurie: Honestly, I’ve always been a Google Sheets partisan. I know that the really hardcore financial types will complain about the functions that are missing from Google Sheets versus Excel—


Corey: Oh, will they ever.


Laurie: —but I’m not that person. But we have a pretty great stack that I like quite a lot at Netlify these days. We have a variety of older tools laying around, not all of which we’ve migrated away from, but the core of the new class is this company called Databricks, who are basically Spark clusters as a service. So, you can just throw, essentially, arbitrarily large amounts of log data on to S3 buckets on AWS, and it can query them as if they were databases, which is truly beautiful. And on top of them, we have a system called Mode Analytics, which is a general platform for data analysis, and presentation; draws graphs, that kind of thing; has an SQL interface.


And between those two we’ve got a new open-source project, or relatively new to me anyway, called dbt, which is this very organized, clever way of codifying your best practices around data. So, you’ve probably heard of extract, transform, load jobs; it’s basically a way of quantifying chains of extract, transform, and load jobs such that they’re always tested, and always running, and you know what the dependencies are between them and everything is documented.


Corey: Okay. While I’m in the process of getting everyone in trouble on things, what is your take on machine learning for things like this? Because it seems that whenever you talk about data, it’s inevitable that someone, usually with a crap ton of VC backing, will immediately jump in because they’re clearly getting bonused every time they managed to fit the phrase ‘machine learning’ into basically anything.


Laurie: So, I would step back a bit and say that, before I joined Netlify, I interviewed at a couple other companies just to see what the space was like, for basically the same job at other companies. And there was a really interesting pattern that I noticed, which is that it is quite a common pattern for an early-stage startup, to say, “Oh, we have a data problem. We must hire a data scientist.” And they go and find somebody staggeringly qualified, with a PhD in data science, and they hire that person. And that person immediately runs into trouble because that is not actually the problem that they have.


They don’t have a data science problem; they have a data engineering problem. They have, like, mounds of data lying everywhere, and it’s not organized, nobody knows where it is, nobody can query it efficiently. A data scientist is, at earliest, your fifth hire in your data team. The first five people are people who have to do an enormous amount of plumbing and engineering to be able to just get the data from all of the places that it’s lying around, all of the piles that it’s accumulating in, into any kind of a reasonable format that you can query it and figure out what it does.


Corey: You have to forgive my cynicism on some level because I’ve been in the ops space for, I guess, entirely too long where I’ve been dealing—particularly in the context of AWS bills, with making arguments against data science teams who are insisting that the Apache logs from 2012 that are taking petabytes of space are the key to unlocking the mysteries of the business. They’re not sure how yet, but one day they’re going to become super valuable, so I’m never allowed to delete anything. And on some level, it just almost seems like it’s a big make-work conspiracy for data scientists amongst each other which, hey, respect. Counter-argument; what sorts of insights can you glean from these vast quantities of data because everyone else I’ve talked to about this generally works for a big-data-oriented company. I got to be honest with you, it feels like they’re selling pickaxes into a gold rush because, “Oh, it’s very important to keep all your data so that we can sell you things to go through it.” You’re on the other side of that your buy-side. So, what is the value that this giant data hoard winds up providing?

Laurie: Well, I will say that my initial inclination is to agree with you. There’s definitely a lot of pickaxes being sold to miners who have no idea what they’re doing. I think about ten years ago, there was a huge industry-wide pile-into big data people were like, you need Hadoop, and you need gigantic data processing clusters, and huge data, and massive amounts of processing, and, like, buy this enterprise contract for $100,000 a year. And then everybody did those things and was like, “And now what?” And they were like, “Oh, well, we don’t know. Maybe you can count it up. How many hits did you get?”


That’s not useful analysis. Having all of your data queryable is not, per se, a useful thing to be able to do. And I think in the 10 years since then, people have got smarter about that. They realized medium and small data are actually [laugh] often quite useful. It’s more about how you analyze it, and can you present it to people, and can you make sense of it?


But there was a second gold rush into the ML space. There are certainly use cases where you have enough data and a problem that is amenable to being solved by applying ML to it in some way. Those are a minority of cases; they’re maybe five percent of all data problems are big enough that you can use ML in the first place, and also get an answer that ML can help you with, would be helpful. And the other ninety-five percent, it’s just plumbing and engineering.


Corey: Once upon a time, it felt like the way to address all this data was the… honestly, the result of a prank perpetuated many moons ago by what felt like Google in a white paper, that Yahoo went for hook, line, and sinker for MapReduce, which then led to Hadoop and a bunch of other stuff. I maintain this was a Google April Fool’s prank that everyone took way too seriously and went way too far. These days, it feels like stream processing as that data comes in is sort of the preferred approach. Yes, no, or am I completely misunderstanding most of the point? Or all the above?


Laurie: I would say definitely, the industry has moved away from the batch processing that Hadoop did. I actually worked at Yahoo at the time when they were inventing Hadoop. [laugh].


Corey: Oh, you fell for it, too. Great.


Laurie: [laugh]. I was—we were selling the Kool Aid as opposed to drinking it.


Corey: Oh, if you’re going to be involved in a Kool Aid transaction, that is absolutely the side of it you want to be on. Let’s be very clear here.


Laurie: So yeah, streaming processing, but like semi-real-time processing of things, as opposed to giant batch jobs is certainly where stuff has mostly gone. Although people who are end consumers of data, as an analyst, if I asked you how fresh does this data need to be, they will always say realtime. Like, [laugh] that will be their first answer. And then I’ll be like, “What if it was 24 hours delayed?” And they’re like, “Oh, yeah. Well, obviously, yesterday’s data is fine. I’m not going to care about what happened at noon today when it’s 2 p.m.” And then you’re like, “Well, yes. Well, then it’s a batch job, and it’s, like, an order of magnitude cheaper to provide to you, so let’s do that.” Batch jobs are still very cost efficient and so we do a lot of batch processing, it’s just we don’t make a big song and dance about it anymore because it’s no longer the new shiny thing.


Corey: On some level, it feels like that is the nature of things where something gets announced, and it’s super complicated and hard, and people skill to the peaks of complexity, and they make good money doing it. I mean, in the original dotcom boom, ‘firewall engineer’ was a quarter million dollars a year if you could swing it. Now, it’s just assumed that basically, anyone who touches the network should be able to configure firewall rules; things get simpler with time. It feels, on some level, like an awful lot of the data world is undergoing some of that consolidation as well, where we’re starting to find tools and methods and ways to extract meaning from giant piles of data without the part where, you know, you go and drop $5 million here on a data science team.


Laurie: Well, you’ve sort of arrived at my favorite pet topic, which is the stack. The stack is this abstraction that I wrote about at the beginning of last year. It’s the idea that the ever increasing complexity of technical fields means that we are constantly inventing, adopting, and then forgetting about abstractions. As you said, we’re constantly chasing after the new shiny thing; we make a big song and dance about it; it’s very complicated. People make enormous amounts of money doing it in the early days, and then somebody eventually invents some kind of tool or open-source framework, or possibly, like, a SaaS that makes it one-click to do.


And it’s not any less complicated or any less magical than it was before, it’s just you think about it much less, right? Like I mentioned, Databricks. Every time I run a query Databricks is taking my SQL, converting my SQL into giant MapReduces, running it on a huge cluster of machines of arbitrary size alu—I don’t know what size it is because I don’t need to care anymore—and then pointing it at AWS, where it’s pulling in every single piece of data in every bucket that I put in there. And all of that, ten years ago would have been of a complexity that only Google or Yahoo could do it. And now it’s literally we spin them up by clicking a button and we don’t even remember that it’s happening.

Like, all of that complexity is still happening, all of that magic is still happening, but now it’s just a commodity. And we’re doing that across the tech space. So, we’ve certainly done it in data; a bunch of stuff that used to be very complicated, used to be the thing that you would hire me to do is now just the tool that I use and the thing that I do is the analysis, which is a more useful use of someone’s time, really.

Corey: One of like to hope so. But I do feel like there’s a story—and we see it across the board; this is one of the things I really enjoy about Netlify—once upon a time to put a website on the internet, you had to know a whole bunch of different things all at the same time. It was, how to build a web server, how to maintain and patch that web server so it didn’t become an attack spam cannon, how to get files into a format the web server could understand, how to put that out there, how to get DNS to work, how to handle SSL—if that was even a glimmer in your eye at that point—and so on and so forth. Now, it really requires, click a button. And Netlify is made this way easier because I tend to look at this from the exact opposite side in the industry where I come from an ops background; building all the infrastructure to handle these things is relatively straightforward to me, but then I get to the other side.


Cool, now all that’s done, “Build the web app.” And my response, “Ehhh, what?” Yeah, I can write bad HTML by hand, sort of, and that’s as far as I generally tend to go, whereas it feels like the Jamstack story in general, and Netlify in particular, are aimed at folks in many ways, coming from the other side of the world where it’s, “I picked up JavaScript. I picked up a framework or two. I understand frontend, I understand how web applications get built. What’s the deal with this whole infrastructure piece?” And thanks to the miracle of stacks collapsing in upon themselves in many respects, you don’t have to know about that or care, and you live in this blissful world where the term Kubernetes never crosses your desk. Is that a fair summation of the state of the industry? Am I dramatically misunderstanding what Netlify does and for whom?

Laurie: No, I think that’s pretty much how it goes. One of the reasons that I wrote this blog post about the stack—it was almost exactly a year ago—is because about a year ago is when I joined Netlify and I was suddenly immersed in the things that Netlify does. It became more clear to me that I was seeing a fundamental shift happening.


I was like, “Oh. We are obeying some kind of natural law here, right? We are taking things that used to be people’s whole jobs and turning them into things that are so simple that you don’t even think about them happening anymore.” I’ve definitely met and worked with people in my life whose whole job was managing SSL certificates. And now, it’s literally a checkbox. And it’s on by default. It’s like, “Would you like your site to be secured by SSL?” Yes, obviously. I don’t know why I would turn that off.


And it just comes as part of deploying your website. Way in the background, let’s encrypt is doing it, and there’s a whole bunch of song and dance about refreshing certs every 90 days, and it all just happens completely automatically without you caring even a little bit. And that’s what Netlify is doing. It’s taking things that used to be five or six companies and squishing them down into a single layer that you call your deploy service. And you’re like, “Great. My deploy service does all of those things and I don’t need those other five companies anymore.”

Corey: Now, if you’re one of those five companies, that becomes something of a problem. But again, that’s the pace of innovation. That is the world continuing to evolve.


Laurie: Nobody wants to be commoditized, but on the other hand, the company that gets to do the commoditizing tends to run away with it, right? Like that’s kind of the AWS story. It’s like, there used to be lots and lots of companies that would sell you a server in a rack and then take 24 hours to set it up and you’d pay with a credit card. And AWS was like, “What if that was one button?” And everyone was like, “Yes, I would love that to be one button. I never want to care about what rack it’s in anymore, or whether or not it has enough power, or whether or not the cable in the back has got jiggly. Just virtualize it all the way for me, thank you.” And then AWS completely ran away with it.

Corey: Oh, yes. And it’s AWS, so it was, “What if that button was hidden in a console that doesn’t work super well, and then we give that button a terrible name?” People are like, “Ehh, I’ll risk it.”


Laurie: I mean, the observed behavior of the industry is that we love the terrible console.

Corey: Oh, absolutely. Everyone talks about infrastructure as code, which is basically a polite way of saying I use the console, and then lie about it on conference talks.


Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.

Laurie: [laugh]. Indeed.


Corey: So, since you brought up AWS, terrific, it’s time for me to do my whole conspiracy theory approach here and accuse you of basically war crimes. So, you were big into the npm space for a long time, which is great. I accept the fact that that is a thing that happens—package.json and package-lock.json are basically artifacts of you folks.


Now, AWS has launched their Amazon CodeGuru machine learning—wink, wink, nudge, nudge—powered code review. And of course because it’s AWS, they charge based upon lines of code in a pull request, which tells me that you’re a deep plant for many years now, planning for the day where this one day supports JavaScript—which it doesn’t today—and all someone has to do is check in the package-lock and the package.json files once, and suddenly the entire scheme pays off handsomely. True, false, or I’m not supposed to talk about that in public?

Laurie: It’s true. I’m part of a global cabal whose purpose is to make Node modules infinitely deep until the gravity well sucks in all of programming and we don’t have computers anymore.


Corey: On a slightly more serious note, I do want to talk a little bit about package management—in the context of programming languages as opposed to package management in the context of Linux distributions because, oh, do I have thoughts on that—there are a few different competing tools out there to handle dependencies across different programming languages, in the JavaScript world, in the Python world. And I’m not a JavaScript programmer, except when forced to be, and it’s usually editing something as small-scale as humanly possible and backing away slowly. But my general consensus, looking at it across the board, is that there is no consensus, that there is no clear one right way to do things. Invariably, dependencies always become a challenge. Getting something to a reproducible build while also being secure is a problem.

And no matter what stack you pick, what language you pick, there’s always a—for ‘Hello World’—there’s a step one of setting up your local environment to resemble what the person writing the document’s environment looks like. Is that accurate? Is there some magic tool out there that somehow I’m just unaware of that solves all of this for me?


Laurie: Well, there’s definitely not a single tool that gets it completely right, but I would say that there is a commonality between the things that work that I don’t know that everyone appreciates. So, I’m going to draw a parallel between package.json and Kubernetes right now, so bear with me. Basically the thing that people often don’t like about npm and the thing that people don’t like about package.json is that it says, “All of your dependencies must live here, in your tree. I don’t care how many JavaScript projects are on your computer; I am going to have one copy of every module right here where I can see it, and I’m going to use those and only those.”


It tends to make JavaScript programs a little bit easier to debug because you know that the code that is at fault can’t possibly be anywhere else. It can’t be sitting in userlib unexpectedly, or in some additional libraries folder, or it can’t have been, like, blown away by somebody installing something else. It has to be the one that’s sitting in your tree, and that’s one of the things that made Node so popular in the beginning, and npm so popular at the same time, was that it was very easy to deal with, and in particular, it made it work on Windows, which didn’t have any of those things anyway. And Node's popularity as a development environment, where you could write code on Windows and it would work perfectly in a Linux environment because all of the dependencies were JavaScript and that ran the same on both of those computers is understated. And that’s essentially the Kubernetes story.


Kubernetes is saying, “This thing where we have libraries all over the place, where we have dependencies all over the place, like, they lie all over the operating system. It’s too late to fix that. What if we packaged up the entire operating system and said that that’s the package?” And that’s what Kubernetes is. It’s creating a package.json of your entire computer, and then you run that.


Corey: It sure beats the old approach of, “Oh, it works on your machine. Great. Well, backup your email, Slappy, because your laptop going to production.”


Laurie: Exactly. Right. It’s basically, you’ve packaged up the entire world. And people are like, “Well, this is very wasteful.” And we’re like, “Yes, it’s very wasteful. But it works.” And like the other approach—


Corey: You know what’s less wasteful, then? That’s right, a whole bunch of engineering time spent fixing things. “Well, that’s not the most optimal way of doing it,” say people who seem to consistently mistake their time for being free.


Laurie: Exactly.


Corey: No, and it makes perfect sense. I love the fact that I can use at least some semblance of what other people are using and get it to work. The counter-argument to it is that it’s very—how do I put this—disconcerting when I’m working in a Python project, but I’m using a framework or so that generally installs via npm, and now my Python project has a package.json in there, and I get very confused at first. And, all right, then I run npm install in there and then I’m way more confused. And I mostly just look at this, and I struggled to make sense of it before the penny drops. “Oh, that’s right. It’s because I’m bad at computers.” I wish people would not keep letting me forget that part.

Laurie: [laugh]. Is your objection that you can’t launch a website these days without JavaScript anymore because a lot of people are angry about that, and they send me email more often than you would imagine.


Corey: Well, I assume it’s your personal fault, right?


Laurie: I mean, absolutely. Like, again, the secret cabal; we’re trying to inflate all of your applications with as much extraneous code, with as many security vulnerabilities as we can possibly manage because I work for the people who sell storage and virus scanning, obviously.

Corey: Emailing you about the world requiring JavaScript is evocative of an old story where some town manager angrily emailed the CentOS project maintainers because someone installed a web server in his environment and he pulled it up, and this isn’t our town’s website; it’s the default, “Welcome to CentOS. If you’re seeing this page, you’ve successfully installed Apache. Read these docs to configure it…” and accused them of hacking his website. It seems roughly the same level of technical nuance, blaming you for the proliferation of something in society.


Laurie: I don’t know. I mean, I certainly spent five years cheerleading it, so I feel like people who are, like, “You helped make this popular.” I’m like, “Oh, why thank you. I’m so glad you think I made a difference.” But really, it probably would have happened on its own. Like, I was running after a snowball that was already running very quickly downhill and engulfing villages as it went.

Corey: Absolutely. And I do want to talk to you about that in particular because as people on this podcast often hear, I talk about this podcast, I talk about the AWS Morning Brief my other podcast, and I talk about lastweekinaws.com where my newsletter lives; I don’t urge people to follow me on Twitter, I don’t talk about the Facebook page I don’t have. And the reason behind all of those things, is that I have built an audience on open standards and open platforms so that no one company can change business models and suddenly I have a serious problem.

It’s why I blog on my own website, not on Medium. Their business model changes aren’t going to directly impact what I do and how I do it. Do you think this is naive? Do you think that the open web was a nice idea and now we’re just going to see increasingly walled gardens as time goes on?


Laurie: I think the openness of your website is—or your web app, or your, sort of, technical strategy in general—is always going to be a hybrid; like AWS is… it’s not rolling your own, you’re using a service. If AWS decides that they don’t support your service anymore—which they never do as far as I can tell, but theoretically, they could—you would have to stop doing that; you are to some extent locked into AWS. But I don’t think that a website hosted on AWS is, like, not part of the open web.


Corey: I would agree wholeheartedly on that point, absolutely.

Laurie: Right. I think at that point, you’ve adopted a tool that works for you, and you can move elsewhere. So, there are people who say using JavaScript frameworks, that’s not the open web, you should have been writing your own; you’re dependent on Facebook continuing to maintain React. And I’m like, “Well, kind of, but not really. You don’t have to be. You could write your own website, if you wanted to. This way, it’s just faster, in the same way that hosting it on AWS is faster than spinning up your own machines.”


Corey: Oh, I take it a step further beyond that, I paid WP engine which, they manage WordPress for me, so I don’t have to, and the reason for that is I’ve managed WordPress in the past, and I will not go down that path again for love or money.


Laurie: [laugh]. Right.


Corey: But then, as a fun artifact of that, lastweekinaws.com does in fact live on GCP.


Laurie: [laugh]. Nice.


Corey: But it’s WordPress. Worst case, WP Engine shuts down, or charges me at times more, or decides that now, nope, everything has to move to a new framework, I can migrate it elsewhere. And the fact that I have that strategic exodus means that I don’t need to sit here on everything I build and agonize over, do I go all-in on my current hosting provider or not? It’s something that I can migrate with me. And I try and maintain at least that theoretical exodus path.


I can repoint domains to other places; I own the domains myself, and that has been enough for the way that I view the world. But increasingly, I’m starting to feel like a relic. Oh, follow me on Instagram; follow me on TikTok and it’s if these platforms pull a MySpace and vanish, then you’ve got to rebuild your audience from scratch, whereas email’s been with us longer than I’ve been alive, and it’ll be here long after I’m dead. I can carry that audience with me regardless of what any particular provider has. I just wish I didn’t feel like such a Captain Edgecase, or someone stuck in the past whenever I articulate that to some folks.

Laurie: Well, I’ve been in the industry a long time, so I think if you’re going to, sort of, say, “I’ve got this old opinion,” I’m going to be like, “Me, too. I’m also extremely old.”


Corey: And then we’ll talk about the Great War. “Wasn’t it amazing?” And, yeah, there we are.


Laurie: The browser wars of ’97, and I was ‘16.


Corey: Yes, we’ll make Eternal September references all week.


Laurie: Oh, my God. See, we’re literally doing that thing that I was just joking we were going to do.


Corey: We absolutely are.


Laurie: Yeah, I think you have to pick your battles. I think the one that I personally struggle most with is databases. I spent a good chunk of my career as a DBA; I definitely know how to install and configure databases. I don’t want to. [laugh]. You know, like, using one of the fancy databases as services, where you’re just like, it has an SQL interface and it’s got apparently infinite storage and infinite processor, and I don’t need to worry about it anymore.

Corey: Exactly, and it has those things because what it also has is someone else’s credit card. Done.


Laurie: Right. It’s great. But to some extent, I’m definitely locking myself into that database service, right? To some extent, I have to find an equally capable service if I ever wanted to migrate away. So, am I still open, or am I locked in then?


I don’t think anybody can call themselves truly independent, anybody can call themselves truly open. So, from your perspective of, like, what platform am I on, as long as you’re not only on that platform, as long as it’s not your only bet, I think—sure, pile into the Facebook page. Why not?

Corey: Yeah. I have separate problems with that that we need not get into here.


Laurie: [laugh].


Corey: That’ll be a whole separate episode there. So, as to look across the past—I don’t know, let’s call it eight decades that you and I have been in tech together, what are the themes you’ve seen continue to emerge that people should be paying attention to moving forward?

Laurie: I think one of the most common mistakes that I see in technologists who’ve been in the industry a long time, is—I can tell that they’re doing it because they start ranting about ‘the fundamentals.’ And it is my firmly held conviction—and no one will sway me from it—there is no such thing as the fundamentals. Everybody comes into the industry at a certain time, when a certain set of tools were considered commodities that you don’t need to think about, a certain set of tools were considered, like, the complicated thing that you need to learn, and a certain set of tools were considered, like, fluff on top that are bonus, but those things are always drifting downwards, right? Yesterday’s fluff is today’s bedrock, and the new fluff is stuff that wasn’t invented before. And then they start going, “Well, you should be able to understand HTTP, and roll your own JavaScript framework because those are the fundamentals.”

And I’m like, “Only to you because you came into the industry when that was the complicated thing. The fundamentals to somebody who started 20 years before you did are like, ‘you need to know about power management and how to configure a firewall,’” like you were saying, in the beginning of this thing. Everybody’s fundamentals are somebody else’s fluff.

Corey: Oh, you want to learn how Linux works? Step one—I see this in classes all the time—learn how Vim works.

Laurie: Right, exactly.


Corey: How about not doing that?


Laurie: Oh, my God—


Corey: —and focusing on the differentiated part. My God.

Laurie: The bizarre cargo-culting of Vim. I’m like, “You know why the people who are good at Vim are good at Vim? It’s because they’ve been doing Vim for 30 years. If you do any tool for 30 years, you’re going to be really good at it.”


Corey: So, you say that, but then you look at me with databases, and I don’t know, I might be able to fool you on that one.


Laurie: [laugh]. Use any tool for 30 years, and you’ll be so good at it that the switching cost is too high to go to anything else. But if you’re just starting in the industry, you could start with any editor that you wanted and it would be fine. And by the time you’ve been using it for 30 years, you’ll be like a goddamn wizard at it.


Corey: Mm-hm. Absolutely.

Laurie: So, that’s what I tell people is, like, the things that you learn now, you’re going to have to expect that they get commoditized. The stack that you live on today will get crushed down to nothing and you have to be constantly climbing the stack to what the new thing is.


Corey: [laugh]. I want to thank you for taking so much time to speak with me today. If people want to hear more about what you have to say and how you wish to say it, okay can they find you?


Laurie: I’m most active and responsive on Twitter. My username is @seldo and I also own seldo.com where I blog much less frequently than I would like to.


Corey: And we will, of course, put links to both of those into the [show notes 00:32:33]. Thank you so much for taking the time to speak with me. I really appreciate it.


Laurie: Thanks for the invitation. It’s been a lot of fun.


Corey: Really has. Laurie Voss, senior data analyst at Netlify. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and an entirely insulting, rambling comment complaining about how I talked about all these different package management systems for different languages and never once mentioned Rust.

Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.

This has been a HumblePod production. Stay humble.
View Full TranscriptHide Full Transcript