Making Machine Learning Invisible with Randall Hunt

Episode Summary

Randall Hunt is a developer advocate at Facebook AI. Prior to this position, he worked as a solutions architect, software engineer, developer advocate, a developer evangelist at AWS, a software engineer at SpaceX, and a developer evangelist and software engineer at MongoDB, among other positions.

Join Corey and Randall as they discuss the differences between TensorFlow and PyTorch, the breadth of contributors to the PyTorch project, what it’s like to listen to a conference talk by Randall, how Randall got started live coding on stage, why Randall believes audience participation is the key component of a successful talk, using machine learning to optimize the office coffee shop, how well-executed machine learning is invisible, how Randall will always be a huge AWS fan even though he no longer works there, the energy at re:Invent, and more.

Episode Show Notes & Transcript

About Randall

Randall is a Software Engineer and Open Source Developer Advocate at Facebook. Previously of AWS, SpaceX, MongoDB, and NASA.

Links:

Totes Not Amazon: totes-not-amazon.com
Twitter: https://twitter.com/jrhunt

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.

Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.

Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I've been waiting for this one for a while. I'm joined this week by Randall Hunt, who is currently a developer advocate at Facebook, but for years, was also a Developer Advocate slash evangelist slash gadfly slash I can't believe he hasn't been thrown from the building yet at AWS. Randall, welcome to the show.

Randall: Hi, Corey, thanks for having me.

Corey: So I've been trying to get you onto the podcast for a long time, and to their credit, AWS PR did everything in their power to never respond when I explicitly requested you by name, which makes perfect sense because you put the two of us together, we feed off of one another. And suddenly, now it's an incident.

Randall: It's a parasitic relationship. Symbiotic? I don't know, one of those two.

Corey: It's a negative feedback loop, I suspect. But now you don't work there anymore and I never have, so the gloves get to come off. But first, before we dive into what was, let's talk a little bit about what you're doing now. You're a developer advocate at Facebook, as mentioned, with an emphasis on AI and more specifically, something called PyTorch. To my understanding, PyTorch is like TensorFlow except Google didn't create it. I'm betting there's more nuance there. Tell me more.

Randall: Yeah. Well, interestingly, I would say TensorFlow is more like PyTorch now. So, PyTorch is the culmination of years and years of open source projects across a couple of different platforms and ideas. And it's really—if you've ever heard of a framework called NumPy that does a lot of kind of statistical analysis and other kinds of data manipulation techniques, and what PyTorch does is it's an open-source framework that does everything NumPy does and more, but with GPU based acceleration, and a lot of kinds of runtime improvements that you couldn't really think about doing. And the really amazing part is TensorFlow, kind of, realized the flexibility of PyTorch back, I would say, 2018 or even earlier, and TensorFlow 2, it uses the same kind of API that PyTorch does.

And I think almost all of these frameworks have started to move towards the execution methodology that PyTorch enables. And it's a giant open-source project, really huge community going around it, and I'm pretty proud of all the work that everyone has done on it, from professors at Cornell to kids sitting in their garage, like, blasting techno music. It's just a pretty huge breadth of people.

Corey: One of the things that I always appreciated about your talks that you gave when you were at AWS—and also back in that era when people went places and sat face-to-face in rooms with no masks on. What a concept.

Randall: I don't remember this at all. [laugh].

Corey: Right. You would get on stage and you would give talks and you were basically everything I swore I would never be on stage. You would write code live and see how it went. Yeah, you were someone who went deep into technical weeds and did live coding rather than a contrived demo is the way that I prefer to do it because the demo gods are indeed spiteful. And sometimes it worked.

Sometimes it didn’t. You always turn it into a story. But more to the point, you were one of the shining lights at AWS when it came to not just telling me how to use a given service, but why. The value it got from this because you would start out by writing some silly bot to do something, and your stated purpose at the beginning was, “Let's build a bot to do X.” And now suddenly, I am coming on board with you. I'm going down that road. Even if it's patently ridiculous, you painted the picture of what was possible. And that's something that I don't think that a lot of AWS storytelling focuses on to its own detriment.

Randall: Well, thank you for saying that. Yeah, I always really enjoyed doing live coding. To be honest, I got my start—we won this TechCrunch Disrupt nonsense with a team back in… I think it was 2011 or something. Maybe even 2012. And I had never live coded on stage before, but our demo broke while we were speaking.

So, I started live coding and the audience just went crazy and laughed, and I was kind of joking as I did it. And I got addicted to the stand up comedy-esque thrill of trying to code live on stage. And I've done it ever since I guess.

Corey: I feel very similar to you do, but the thing that really was revelatory for me and it sounds like it is for you, too, it's the wrong framing to view it as stand up comedy. Because I thought that's what I did. Turns out, absolutely not. Stand up comics, rehearse, and repeat, and prepare, and write constantly, whereas what I do is show up unprepared, which is properly known as improv. So it's stand-up style, but it's a lot less work slash planning slash caring about the outcome, apparently.

Randall: Well, I think there's a rehearsal component to it. So, I found the more successful talks—

Corey: Oh, here's where we diverge. You actually prepare for things. Oops.

Randall: Yes. I had to learn that the hard way. [laugh]. So, I didn't start off doing that. In my early 20s, that was not the case. But I found that the best talks are the ones that are a mix of extemporaneous improvisation, even audience prompted so the audience feels like they have a stake in the game.

So, if you take a poll of the audience, and you say, “Hey, what should we build today?” And you can kind of take suggestions and craft them into something you already have mentally prepared in your head so it seems more extemporaneous than it really is. So I'm kind of revealing my trick here. If you ever see me go do a live coding demo, almost all of them involve some form of audience participation. Voting: so going and saying, oh, let's vote on Vim, or Emacs or something like that, these are things that the audience can get very involved in. And I definitely have every possible way of building a voting app memorized at this point, so it's cheating a little bit. I'm not coding completely from scratch; it's in my head, I've done it before. Most of the time.

Corey: So, help me recapture that old magic, I guess. My position on AI slash machine learning is it's a cleverly executed scam across the board because all the cloud providers are saying up and down that you need to do machine learning to remain competitive. And okay, great. I would expect that when you scratch beneath the surface and you look at what machine learning requires, which is basically a whole bunch of compute and a whole bunch of storage. So yeah, I can get why the big cloud providers who sell both of those things would be interested in advocating for this.

Randall: Dolla dolla bills.

Corey: Exactly. But when you look at the stories of people using machine learning or AI in the real world, they always have the ring of either something that is extraordinarily use-case-specific to one company or is patently ridiculous. I mean, the classic example was WeWork used advanced machine learning algorithms to learn that there was a bottleneck in several of their facilities in certain times in the morning, and they alleviate that by hiring a second barista. Not kidding. The response is, “Wait, you spent how much on data science and machine learning to figure out that people like to drink coffee in the morning?” It just seemed insane from that perspective. So, change my mind. You've been a big advocate of it for a long time; what is the business value other than it makes you sound more expensive and thus can command a higher salary? Which, respect.

Randall: Okay, you mentioned WeWork and I just want to say if anybody from work is listening to this, the WeWork right across the street from my house has had a light shining into my face for literally a year. Please turn it off. Okay. Now let's focus on machine learning. So, when I think about machine learning, I think about it as on the x-axis, you will have say compute, and on the y-axis, you will have the amount of data that's available to you.

And there's a great graphic—[maybe I can send it to you later 00:09:52]—that the [sci-fi 00:09:54] team produced which says which machine learning techniques or deep learning techniques should be used for which sets of data and things like that. Now, traditional machine learning, it's everywhere in the world around us. So, the voice activation filter on the microphone that you're using, that is likely using machine learning now, instead of using some hardware pop filter or something. The noise subtraction that you hear in audio engineering will be using deep learning now, instead of trying to do it with frequency analysis tools. The ad bidding all of this other stuff, the typical specific use cases, those are all still using machine learning as well.

But it's a little bit of a misconception that you need to build these bigger and bigger and grander and grander models. There's definitely a part of the industry that's moving that way. So there's a part of the industry that says, “Hey, let's have these terabyte models that have billions and billions of parameters.” There's a great talk by Geoffrey Hinton back in, I don't know, I can't remember the date. But there's a great talk by Geoffrey Hinton that goes over the number of fixations that a human brain makes.

So, a human brain is what a lot of artificial intelligence machine learning is based on. Like, my entire Twitter account, for instance. And if you take the number of fixations that a human brain makes, it’s, like, 10 billion fixations over the course of its lifetime, where we can outperform human brains with way fewer parameters than that. So, the ways machine learning are being applied and the things that are being done these days are ubiquitous. I mean, it really is everywhere, but the things that are working and the things that are easy are the ones you don't see.

So, machine learning, when it's done correctly, you don't even realize it's happening in the background; you don't see what algorithms are going on and doing these predictions. And so you might not see it, but it's pretty much everywhere you're looking these days.

Corey: Yeah, my experience with it is a bit more toward the ridiculous. One of my employees, a while back, when we were just friends and we didn't work together, we ended up writing totes-not-amazon.com—that's totes dash not dash amazon dot com—and it uses a Markov generator. Every time someone visits the page that is trained on the corpus of all—I think what is it—12,000 AWS service announcements dating back to 2004. And some are hilarious, some make no sense, and some are hilarious and make no sense. And the funniest ones, of course, are where you don't actually know if it was lifted wholesale from an actual release announcement or not.

Randall: Yeah, I think that's a funny thing. These models now, like GPT-3, and Bert, and other things where you're starting to wonder if a human wrote this as satire or [laugh] if it's actually AI.

Corey: Well, to be fair, that’s some of the release announcements that humans apparently do write. Or I assume with humans; it could just be machine learning experiments. My personal favorite is whenever they come out with an announcement that I wind up reading, and at the end of it, it was, “Well, those are certainly a bunch of words, none of which I understood in the context they were used.”

Randall: [laugh]. I just went to the site and the headline that I got was, “Amazon poly ads, bilingual: Indian English, Hindi language support for AWS Code Pipeline, AWS Config, adds support for non-RFC 1918 address ranges.”

Corey: That's amazing. That's the sort of thing we're talking about. It's a blast. I did some experimentation with GPT-2, and that turned into some interesting stuff. But for that I had a bit more fun and turned it loose on, effectively, not the release announcements, but all of the blog posts that Jeff Barr wrote, dating back to the launch of AWS.

And it's fun because Jeff has a personality, and it would capture entire sentences that I thought were hilarious, and that's ridiculous. What a human-sounding bot this is. And I would check the corpus and that exact entire sentence was used. And it was a lot more sensible in the context where it was originally found, it just becomes ridiculous when you surround it with other unrelated things. So honestly, having talked to people who are good at this, it turns out, I don't fully understand how training works and how to tune it to get the best results possible. I'm mostly punting until GPT-3 becomes more available.

Randall: Yeah. Well, there are even things on GPT-3 already as well. You know, I don't know if you remember back in 2018, I did something similar is, I scraped all of the blog posts—I used to write for the AWS blog. I think I wrote 50 or so posts over the years—and I scraped not only my posts but all the posts from all the authors on all AWS blogs. And I created a model that would generate—sort of like this Totes Not Amazon site, it would generate fake blog posts.

And I did this all live on Twitch. So, that kind of goes back to the live coding is I think there's this general idea in the industry that machine learning is some secret. And I think the machine learning engineers are kind of incentivized to keep it that way because they can earn the big bucks as long as it's a specialty. But the reality is, most of the machine learning techniques and training techniques are all, sort of, automated. So you're saying do a softmax here.

Okay, do a activation function, do this layer, do this fully-connected layer, whatever. All of that stuff is completely automated at this point. You don't really need to think about it. You can tune it, you can play with it if you want, but the gross majority of machine learning these days is really just data preparation. If you look at the Twitch video that I did a couple years ago, we spent more time writing a scraper to download all of the blog posts than we did training and deploying the model onto SageMaker.

So it's interesting to see all this focus from all the different industries and blog post talking about how ml is this huge, big thing. But it's data science, again. That's where everything's at is kind of doing feature engineering and data science. And I wish I would see more focus in the industry on that side of things.

Corey: That's the challenge, though is the entire industry is expanding, and it's getting bigger all the time. Which reminds me a lot of things from yesteryear, and really, the reasons years ago, I wanted to have you on to have some of these discussions. So let's do it now. Pivoting away from the machine learning morass for the moment. Let's talk about big cloud providers and their growth and what we're seeing in the market. So let's begin with a provocative thing that will start a fight. If you could change one thing in AWS, what would it be?

Randall: I tweeted about this. I would love for there to be within AWS an S-team level goal, an Andy Jassy level goal that says, all of our APIs that we release and all of our developer experience that we release will be gatekept until this number of internal developers agree that it's a pleasant experience. I can rant more about this, but if I had my druthers, if I could change anything, I think the single most important change that could happen would be to have developer experience as a core goal for every single service team. Except WorkDocs. [laugh].

Corey: Well, I love the idea in principle, but it sounds like it could turn into a few things that are terrible. One being that then you wind up with the perfect becoming the enemy of the good and never shipping anything. And the other is that it's become very apparent to me over the years, watching AWS releases and having a launch-day product that is—how to put it—basically crap. And it learns from its customers, and two years later, you look at that product again and it is world's better. I don't know that you'd be able to get it better if you had, A) launched later in the process without that customer perspective, or if you had not built things that were reflected by the use cases customers put them to.

Randall: I think that's a fair call-out. And I'll just kind of say, yeah, but AWS is pretty well established at this point. They can afford to take an extra seconds to get things done correctly or to do a finesse and polish check. I don't know if you remember the Elastic Kubernetes service launch back in say… I can’t remember. Was it 2017?

That was essentially at the time, not a real service. It was, “Here. Run this AMI.” And that was the entirety of it. That's not a pleasant developer experience. So, everyone who had access to the service in preview was sitting there complaining about the service and the complete, really lack of a coherent presentation layer. They weren't asking for new features because they were too busy complaining about how frustrating the API and that side of things were.

So, I think there's a balance, obviously, and businesses want to move fast. But here's the thing, the kind of thinking that got AWS to where they are right now is not the kind of thinking that's going to take them into the future because the problems that they've created with this rush of onslaught of new services and new information is impossible for an individual developer overcome with a current developer experience. And I can go on at length about this, but—you know, the particular services that I would call out. But if I harken back to the early days of AWS, when EC2 came out, for instance, one of the things that made me go to work for AWS in the first place, was they never had a run instance API. So, you think about launching an EC2 instance, right?

Corey: Mm-hm.

Randall: It's like, “Okay, cool. Run instance.” It wasn't ‘run instance.’ It was always ‘run instances.’ It was plural from the beginning. And that seems obvious to us in 2021. In 2011 or 2010, people weren't thinking about launching multiple instances simultaneously. I mean, some people were obviously but it wasn't a common theme back then.

So, they were thinking ahead about the developer experience and they weren't painting themselves into a corner that's impossible to escape from. This is why you have different versions of different APIs that are expressed across different services now, and you'll see different things, sort of reinventing themselves and relaunching as new services. Because that lets them internally get more buy-in and get more effort kind of devoted to this new shiny thing. And the real value, the things that would drive tremendous value for customers, even if they're not necessarily asking for it, would be to focus on the developer experience, the hands-on-keyboard experience of using AWS.

Corey: The challenge that I think we're seeing, too, is if you go back to its inception, AWS was very clearly aimed at engineers slash builders slash surly sysadmins, folks who for one reason or another, were working with the computers an awful lot. And it’s—at the time of this recording—turned them into what is basically a $46 billion a year business. And it's gotten them super far. And it's carried them super well. The challenge, though, is that their next $46 billion a year is not going to come from the same places that the first one did.

I mean, we talked about a $2 trillion IT industry and growing. A lot of the folks that are still running on-premises are now not building net new on top of Cloud, but they're migrating in. They have a different philosophy, they have a different approach. And when you say ‘developer experience,’ their response, quite reasonably is, yeah, we don't hire developers were just trying to run our corporate IT somewhere. And there's a lot wrong with that, but that is their position.

Randall: And I understand that, but I think it's a little myopic to have that view. I want to look at Microsoft for a second. So, Microsoft has made a pretty ingenious acquisition of GitHub. Microsoft also runs VS Code. The gross majority of new developers are learning and getting started with VS Code as the platform.

That is their IDE, that is what they spend all their time in. From high school and on, that is what they know. So, if you hire someone who's 21 today, it's very likely the only IDE they've ever used is VS Code. So, Microsoft now owns the code—where it lives—they own, with GitHub actions, the CI and CD of the code, they own the IDE, which is the developer experience. And admittedly, thankfully, this is all open source.

But where is AWS in that developer experience story? Because the acquisition of developers is the funnel that drives your long term business success. I agree that you have these great efforts that can go on in parallel for enterprise sales, and inside sales, and all this other go-to-market activities with these much larger customers. However, think about Twilio. Twilio, started by an AWS—or an Amazon product manager, Twilio became what they are today by focusing and obsessing over the developer experience. They have by far one of the best developer experiences. They set the industry standard there.

Corey: As long as we're not talking about their SendGrid division, I would wholeheartedly agree with you.

Randall: Yes. No comment on SendGrid. I haven't used it in probably a decade.

Corey: I used to. I love what it does, as far as an email [campaign 00:23:15] goes, but the API is kind of sad. If you're listening to this at SendGrid, please reach out.

Randall: [laugh]. So, I don't see these two efforts is mutually exclusive. Improving developer experience will improve your sales across the board because companies have to hire new people. Even if they're running on-prem, they're still hiring new people if they're a growing business. And if they're not a growing business, does it really matter if you're getting their business?

I mean, the economy is going to move on. You don't necessarily need to capture these dying companies; you want to capture the company that are going grow along with your business. And the companies that are growing are hiring new developers. Those new developers have a very Microsoft oriented worldview with VS Code, and GitHub, and all these other things. Where is AWS in that story? And that's what I would love to see change at AWS is this huge focus on developer experience. I think it would transform the company.

Corey: Incidents happen fast but they don't come out of nowhere like AWS bills do. If they're watching, your team can catch sudden shifts in performance, but who has time to check thousands of hosts, services and containers. Thats where New Relic Lookout comes in. Part of full stack observability, it compares current performance to past performance just like your not supposed in the stock market, then displays in an estate wide view of your entire system. Sign up for free at NewRelic.com and start moving faster than ever.

Corey: I would agree with you, and part of the challenge is that AWS is fantastic at building the plumbing, and they have trouble with a porcelain. Sure you can view that through whatever toilet analogy lens you want, but they're great at building the blocks you can use to construct something. But I think that they abdicate almost entirely—and we talked about this with the approach you take to telling stories—is that they don't tell the story about what you can do with those things that they're giving for you. Sure you give me a bunch of bricks and talk to me about building a house, but you haven't demonstrated to me what that house might look like. Help me out. And sure your customers can get on stage, but I kind of want something that stands somewhere in between the two continuum endpoints of ‘Hello World’ and Netflix.

Randall: Yes. I agree. And it's not like all of AWS suffers from this. It's just a majority, I would say there are two projects, maybe three projects that I'm a huge fan of right now. One is AWS Amplify. So, Amplify is an open source project, it's a service, and it's an console—

Corey: And a breakfast cereal, too. I kid.

Randall: [laugh]. Oh, I could use a breakfast cereal. So Amplify, has really just been completely focused on the developer experience and you can tell. You can see it in their documentation, you can see it in the way that their advocates go out and are speaking to developers and telling stories, and they're hitting a whole new generation of developers. So, a lot of developers these days, I don't know if you see all this Twitter controversy that goes on. Somebody tweeted about how they would never hire a front end developer, or front end developers are only junior. Did you see that?

Corey: Yes, I did. And in fact that that Twitter rando was the CEO of Shopify, which is—

Randall: Oh, my God.

Corey: —a bit of a challenge. I understand aspects of the sentiment, which is, for example, you will become a better engineer by understanding other aspects than just the area that you're focusing on. But, “Front end is just for junior crappy engineers,” is absolutely not helpful sentiment. I'm also not entirely convinced that that was the point he was trying to make. But there is the nuance, of course that I'm not here to do communications, PR spin, marketing messaging, et cetera for the CEO of Shopify. He can clarify his own statement. The way he put it was tone deaf and bad.

Randall: Right. And I see where that stigma or whatever is coming from, but the reality of the situation is front engineers today are, by necessity, are some of the best engineers out there. And it requires a complete understanding of a complex set of services. If I go right now, and I want to build a back-end that can scale to millions and millions of users—billions of messages per second. Whatever—with AWS, as long as I'm blindly swiping my credit card, that is not a problem.

I can make that work, right? On the front end, you have so much more nuance, you have so much more complications. And one of the things that AWS Amplify is doing is they're making that front end more accessible to more developers, and they're taking that front end skillset and gently introducing the cloud skill set in addition to it. And I don't think people fully grok how good of an onboarding tool that is because while developers might start with AWS Amplify, a couple of years later, they're not just using Amplify, they're using a slew of AWS services. And when they go out and work in other places, and they want to rapidly prototype something, they're not just going to pick Amplify, they're going to pick all of AWS.

And Elastic Beanstalk did something similar to this back in I would say 2014-ish. It never really lived up to the vision, I would say. I think Elastic Beanstalk is awesome. It's my baby; I love it. But if I'm being perfectly blunt, of course, no, it didn't do what we wanted. And there was this other service, I don't know if you remember, do you remember CodeStar?

Corey: Of course I do. You have to understand, I'm a walking encyclopedia of everything that AWS has ever done. It was sort of their unifying approach to take an opinionated build of, I want to set up a new project. Well, it's going to spin up a code commit repository, it's going to set up a code deploy pipeline, it's going to use code build to wind up building these things. Effectively, it was a… I almost want to use the term Potemkin village of all the build tools that AWS offers, which you only use if you don't have the option of using things that are way better at each of the individual things that they do.

Randall: Yeah, pretty much. And CodeStar was supposed to be this onboarding tool, but what CodeStar focused on was kind of a console experience. They didn't focus on either the command line tool, or the developer experience, or the APIs. So, the only real way to access CodeStar in the beginning, was through the console. So, what Amplify has done is they have you using the AWS Amplify SDK before you ever even create an AWS account.

That's a huge, huge difference in developer experience, especially in onboarding. But even going beyond the beginner side of things, Amplify is able to help experienced AWS developers take pain points with things like Cognito make them much simpler to deal with. And there's also this concept of evolving. So there's this thing called Lightsail that AWS launched a couple years ago to make it easier for developers to go and launch stuff without worrying about specific costs and overruns because they would just charge you a flat monthly fee. Lightsail is great, but again, it's not Heroku.

It's not a command line that you're doing to deploy your app. It was more complicated than that. So, Amplify is one project. I'm a huge fan of it, I can talk at length about it and what I think they're doing right, but they're really focused on developer experience. The other thing that I bet this is one, you'll disagree with me on is CDK.

Corey: Oh, I'm thrilled to have that particular debate with you. In fact, as of the time of this recording, roughly a week ago, I did an article on building a toy app to build out a bot that counts my Twitter followers and writes it to Dynamo as an excuse to play with the CDK. My experience is after an hour, I gave up and went back to SAM CLI.

Randall: [laugh]. So, first of all, I love SAM. Chris Munns and I have spent years working together and presenting together at various conferences. Chris Munns is an amazing speaker and the whole serverless AWS Developer Advocate team is a group of really amazing, and talented, and very, very eloquent storytellers. I think that SAM is really playing catch up to a lot of other frameworks.

So, even [BLANK 00:31:23] is doing more faster than SAM is. And part of that is because SAM is based on CloudFormation. And CloudFormation transforms and all of these other different CloudFormation techniques are being developed in parallel to the things that SAM should be doing. For a long time just doing something as simple as S3 event notifications in SAM was phenomenally difficult. And it only really got addressed, I would say, in 2019. So, it took three or four years for it to really become a solved problem. What I like about CDK is—well, let's take a step back. When you define infrastructure, what do you think about?

Corey: Usually I think of—first I have a workload, usually, that I want to put on an infrastructure that's already built or defined somehow, or some form of code I want out there. I think of infrastructure deployment usually being more of a one and then done approach, as opposed to continuing to iterate on the application that lives in that infrastructure. Now, that's a bit of an outmoded way of thinking in some respects, but every time I push code, I don't necessarily want to reprovision the database that holds the data that code talks to, for instance.

Randall: Gotcha. And if you were to walk back that thought to say, the mid-2000s, how do you think about provisioning hardware?

Corey: In the mid-2000s, my entire approach—because I was provisioning hardware then—was you had to do a lot more capacity planning, there was a six-week lead time, instead of a six-second lead time, there was a lot of building excess capacity in, and you were just getting around to this idea of virtualizing things because it was way faster to spin a new VM—even at slow speeds—than it was to provision, wipe, and reinstall something on bare metal.

Randall: Right.

Corey: So, even that it was starting to get in that direction. Now, I love that everything's an API call away, and you can have more compute power than, like, the entire 20th century humanity.

Randall: Yes. It's amazing to me that a t2 nano or a T3 nano has something like one and a half billion times the compute capacity we took to go to the moon. That's pretty insane. So, thinking about CDK and defining your infrastructure as code: I think defining your infrastructure in markup language, like YAML or JSON, that was a little bit of a foreign concept to a lot of people. But I think code is, one, more expressive, and, two, better suited to cloud deployments.

Now, I have lots of reasons for this, but I do want to issue one point of caution because a lot of times—this is something I've even seen very recently in some folks that I've been working with—there are different kinds of engineers who approach problems in different ways. So, a person who is primarily a software engineer, they will follow—most likely—a principle of DRY. So, don't repeat yourself. And that becomes something that they want to take into their infrastructure deployment and into their continuous integration deployments and things like that as well. My… strong suggestion—and of course, this is not always true, is that ‘don't repeat yourself’ is somewhat the enemy of a lot of infrastructure deployment because when you try to get too clever with CDK, when you try and make everything—oh, let me add one aspect to this one stack and have everything deploy magically with one line of code, you've probably over engineered the problem.

It is perfectly okay to repeat yourself a few times when defining CDK code. I have a cool story about a CDK project, which is what brought me onto it. But originally, I was kind of like you, I was like, super, super skeptical, I didn't really buy into the value of it and then I used it in a real world production project.

Corey: I love the concept, at a high level, of what the CDK offers. And it's better than it was when I first played with it a while back. But there are problems with it. There is on some level, in many cases, a distance between the developer and the infrastructure. That's a different philosophy, and it takes time for companies to get used to that and cultures to shift.

Let's skip past that because eventually, you're right; it's going to be unified. The idea of having all of my infrastructure defined in my codebase for the application means that suddenly I have to integrate CI/CD in a much more meaningful, thoughtful way for all of the application workloads, it means that my code—and the structure of my code projects in this repository—is dictated by how the infrastructure looks. It opens up a number of cans of worms. It does absolutely speed iteration and make this more accessible to developers. That's no small thing. But there is a challenge to that: it requires a different way of thinking, and it's very challenging in my experience to wind up mapping that to something that isn't Greenfield.

Randall: I would agree with that, actually. So, I had tried porting existing pieces of infrastructure using CDK. So, I had a production service that I ran at AWS, that I wanted to move into CDK, and I found it challenging just because I had too much random things going on. And that side of the infrastructure, surprisingly, was the most reliable, but the least iterable, if that makes any sense. You know, it was very reliable as long as you don't touch it. [laugh].

Whereas with the CDK project that I built, so this is—I don't know if you saw Amazon Connect chat. They released a couple of things around re:Invent timeframe, and one of the things that they released is built on top of CDK. So, that project was kind of my baby. It was something that I worked on pretty aggressively and it was a Greenfield project. And I started with CDK because there was a little bit of a push internally to explore it for new use cases and stuff. And I was like you: I was kind of hesitant, I was like, “I don't like this project structure. I don't like any of this.” But over time—that's the same way I felt about DynamoDB in the beginning, to be honest. Do you remember single table design?

Corey: I do, indeed. I remember a lot of use cases where it solves problems, and twice as many use cases where it gets even worse.

Randall: Yeah. And it's a different way of thinking about problems that once you grok it, once you kind of have that mental model in place, you can see the value of it, you can see where it can be applied. So I'm not saying CDK is perfect for everything. What I'm saying is, if you do have this Greenfield project, and you can work around some of the new folder structures and where you're keeping things and how you're thinking about the construction of your stack and your CI/CD, I was able to take a project from literally nothing to deployed in multiple regions, running production workloads over the course of about three hours. Or so in three hours, that one stack was working.

And then as I wanted to implement more things, let's say I wanted to put IAM permission boundaries in there. Traditionally, with CloudFormation, or Terraform, or something like that, I'd have to write a whole section that would go and apply either a transform or something else that would go and apply to all of these different resources that were being provisioned. On the CDK side, I applied one aspect of the stack and those IAM permission boundaries were applied across the board. Let's say I wanted to tag things. This is easier now in CloudFormation than it was, but again, I can have programmatically generated tags as the stack is being deployed.

It can look up what region it's in, it can look up what account it’s in, it can roll up, it can do all kinds of good things with organizations and cost reporting. I've found the speed of iteration with CDK and the reliability of that iteration to be drastically improved over say, traditional CloudFormation, or Terraform, or anything like that. And that was the huge one. And that's why I think CDK is such a valuable project, for Greenfield especially.

Corey: I will meet you in the middle and agree to suspend judgment pending further explorations with it, then.

Randall: Yeah, we should chat about this sometime. I can give you a guided introduction; a no nonsense tour. I think you'd like it a lot.

Corey: You'd be the third person to have done so if we were to go down that path. But I will keep my mind open, and maybe we'll even livestream it for fun.

Randall: That would be fun.

Corey: I want to thank you for taking the time to speak with me now that AWS couldn't keep the fire and the gunpowder keg separate anymore.

Randall: Yeah. And hey, I'll say this. I feel like there's a little bit of a sentiment that I dislike AWS or that I have bad feelings towards them, and that's not the case at all. I'm actually a pretty huge AWS fan. I joined AWS because I was a customer of AWS first.

And I was obsessed with the product. I loved it. I met Jeff Barr in my interview, and he's like, “Okay, well, let's fix this blog post. Here's how you log in to Emacs and stuff.” And that was probably my first month of work there.

And I got to spend the next several years working on a series of really, really exciting projects. But the thing that I loved most about my time in AWS, if I can just take everything in its entirety, the customers that I met and the amount of—just the community. And just going to re:Invent every year. Holy smokes, I actually left AWS for a year and came back, and I made a decision to come back my first day at re:Invent as a customer again. [laugh]. It's a palpable energy, and I was sad to miss that this year.

Corey: I really hope that there's a better story at re:Invent this year than there was last year in terms of getting people together. But I also want to keep that strong online component because it's so much more accessible to folks who can't take a week to fly to Las Vegas, and not do work for that entire time frame, and pay $2,000 for a ticket.

Randall: Yep. [laugh].

Corey: So, combine the best of both, I think.

Randall: I desperately need to get out of my home office and meet people again. And there's another aspect to that is that every demo and every talk I've ever made was built on a train, or a plane, or in the back of a van driving through the floods of Bangkok on my way to my next meetup or something. I need that kind of forcing function of the travel to help me think creatively and build new compelling stories to tell developers.

Corey: Yeah. I think we could absolutely come up with something terrifying in the somewhat near future. More to come on that as it unfolds. Randall, thank you so much for taking the time to speak with me today. If people want to hear more about what you're up to, what ridiculous ideas you have, or mostly just want to see you kick people in the shins, where can they find you?

Randall: I am pretty much just on Twitter. So it's twitter.com/jrhunt is my handle. And I post a lot about AWS, and a lot about machine learning, and occasionally about sci-fi books.

Corey: Excellent. We will of course put links to that in the [show notes 00:42:32]. Thanks so much. I appreciate your time.

Randall: Thanks for having me.

Corey: Randall Hunt, developer advocate at Facebook. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts, whereas if you hated this podcast, please leave a five-star review on Apple Podcasts or your podcast platform of choice along with a comment that is entirely generated by machine learning.

Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.

This has been a HumblePod production. Stay humble.

Making Machine Learning Invisible with Randall Hunt

Episode Summary

Episode Show Notes & Transcript

Transcript

You might also like

Coding Agents, Chaos, and the Future of Dev Work with Dexter Horthy

The Rise of Autonomous Ops: Inside AWS’s DevOps Agent with David Yanacek

Building the Backbone of AI Agents: Telemetry, Open Source, and the Future of Developer Infrastructure with Brian Douglas

Get the Newsletter

Gnarly cloud cost questions?