Understanding CDK and The Well Architected Framework with Matt Coulter

Episode Summary

Corey sits down with Matt Coulter, a senior architect at Liberty Mutual. Matt defines CDK and explains CDK’s supported languages. They discuss Corey’s experience using CDK with his twitter client lasttweetinaws.com and his issues with Cloud Formation. Matt talks about the 6 pillars of the “Well-Architected Framework,” the updated serverless portion of the Well-Architected Tool, and Corey and Matt discuss the results of the community CDK quarterly survey.

Episode Show Notes & Transcript

About Matt

Matt is a Sr. Architect in Belfast, an AWS DevTools Hero, Serverless Architect, Author and conference speaker.

He is focused on creating the right environment for empowered teams to rapidly deliver business value in a well-architected, sustainable and serverless-first way.

You can usually find him sharing reusable, well architected, serverless patterns over at cdkpatterns.com or behind the scenes bringing CDK Day to life.

Links Referenced:

Previous guest appearance: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/slinging-cdk-knowledge-with-matt-coulter/
The CDK Book: https://thecdkbook.com/
Twitter: https://twitter.com/NIDeveloper

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. One of the best parts about, well I guess being me, is that I can hold opinions that are… well, I’m going to be polite and call them incendiary, and that’s great because I usually like to back them in data. But what happens when things change? What happens when I learn new things?

Well, do I hold on to that original opinion with two hands at a death grip or do I admit that I was wrong in my initial opinion about something? Let’s find out. My guest today returns from earlier this year. Matt Coulter is a senior architect since he has been promoted at Liberty Mutual. Welcome back, and thanks for joining me.

Matt: Yeah, thanks for inviting me back, especially to talk about this topic.

Corey: Well, we spoke about it a fair bit at the beginning of the year. And if you’re listening to this, and you haven’t heard that show, it’s not that necessary to go into; mostly it was me spouting uninformed opinions about the CDK—the Cloud Development Kit, for those who are unfamiliar—I think of it more or less as what if you could just structure your cloud resources using a programming language you claim to already know, but in practice, copy and paste from Stack Overflow like the rest of us? Matt, you probably have a better description of what the CDK is in practice.

Matt: Yeah, so we like to say it’s imperative code written in a declarative way, or declarative code written in an imperative way. Either way, it lets you write code that produces CloudFormation. So, it doesn’t really matter what you write in your script; the point is, at the end of the day, you still have the CloudFormation template that comes out of it. So, the whole piece of it is that it’s a developer experience, developer speed play, that if you’re from a background that you’re more used to writing a programming language than a YAML, you might actually enjoy using the CDK over writing straight CloudFormation or SAM.

Corey: When I first kicked the tires on the CDK, my first initial obstacle—which I’ve struggled with in this industry for a bit—is that I’m just good enough of a programmer to get myself in trouble. Whenever I wind up having a problem that StackOverflow doesn’t immediately shine a light on, my default solution is to resort to my weapon of choice, which is brute force. That sometimes works out, sometimes doesn’t. And as I went through the CDK, a couple of times in service to a project that I’ll explain shortly, I made a bunch of missteps with it. The first and most obvious one is that AWS claims publicly that it has support in a bunch of languages: .NET, Python, there’s obviously TypeScript, there’s Go support for it—I believe that went generally available—and I’m sure I’m missing one or two, I think? Aren’t I?

Matt: Yeah, it’s: TypeScript, JavaScript, Python Java.Net, and Go. I think those are the currently supported languages.

Corey: Java. That’s the one that I keep forgetting. It’s the block printing to the script that is basically Java cursive. The problem I run into, and this is true of most things in my experience, when a company says that we have deployed an SDK for all of the following languages, there is very clearly a first-class citizen language and then the rest that more or less drift along behind with varying degrees of fidelity. In my experience, when I tried it for the first time in Python, it was not a great experience for me.

When I learned just enough JavaScript, and by extension TypeScript, to be dangerous, it worked a lot better. Or at least I could blame all the problems I ran into on my complete novice status when it comes to JavaScript and TypeScript at the time. Is that directionally aligned with what you’ve experienced, given that you work in a large company that uses this, and presumably, once you have more than, I don’t know, two developers, you start to take on aspects of a polyglot shop no matter where you are, on some level?

Matt: Yeah. So personally, I jump between Java, Python, and TypeScript whenever I’m writing projects. So, when it comes to the CDK, you’d assume I’d be using all three. I typically stick to TypeScript and that’s just because personally, I’ve had the best experience using it. For anybody who doesn’t know the way CDK works for all the languages, it’s not that they have written a custom, like, SDK for each of these languages; it’s a case of it uses a Node process underneath them and the language actually interacts with—it’s like the compiled JavaScript version is basically what they all interact with.

So, it means there are some limitations on what you can do in that language. I can’t remember the full list, but it just means that it is native in all those languages, but there are certain features that you might be like, “Ah,” whereas, in TypeScript, you can just use all of TypeScript. And my first inclination was actually, I was using the Python one and I was having issues with some compiler errors and things that are just caused by that process. And it’s something that talking in the cdk.dev Slack community—there is actually a very active—

Corey: Which is wonderful, I will point out.

Matt: [laugh]. Thank you. There is actually, like, an awesome Python community in there, but if you ask them, they would all ask for improvements to the language. So, personally if someone’s new, I always recommend they start with TypeScript and then branch out as they learn the CDK so they can understand is this a me problem, or is this a problem caused by the implementation?

Corey: From my perspective, I didn’t do anything approaching that level of deep dive. I took a shortcut that I find has served me reasonably well in the course of my career, when I’m trying to do something in Python, and you pull up a tutorial—which I’m a big fan of reading experience reports, and blog posts, and here’s how to get started—and they all have the same problem, which is step one, “Run npm install.” And that’s “Hmm, you know, I don’t recall that being a standard part of the Python tooling.” It’s clearly designed and interpreted and contextualized through a lens of JavaScript. Let’s remove that translation layer, let’s remove any weird issues I’m going to have in that transpilation process, and just talk in the language it written in. Will this solve my problems? Oh, absolutely not, but it will remove a subset of them that I am certain to go blundering into like a small lost child trying to cross an eight-lane freeway.

Matt: Yeah. I’ve heard a lot of people say the same thing. Because the CDK CLI is a Node process, you need it no matter what language you use. So, if they were distributing some kind of universal binary that just integrated with the languages, it would definitely solve a lot of people’s issues with trying to combine languages at deploy time.

Corey: One of the challenges that I’ve had as I go through the process of iterating on the project—but I guess I should probably describe it for those who have not been following along with my misadventures; I write blog posts about it from time to time because I need a toy problem to kick around sometimes because my consulting work is all advisory and I don’t want to be a talking head-I have a Twitter client called lasttweetinaws.com. It’s free; go and use it. It does all kinds of interesting things for authoring Twitter threads.

And I wanted to deploy that to a bunch of different AWS regions, as it turns out, 20 or so at the moment. And that led to a lot of interesting projects and having to learn how to think about these things differently because no one sensible deploys an application simultaneously to what amounts to every AWS region, without canary testing, and having a phased rollout in the rest. But I’m reckless, and honestly, as said earlier, a bad programmer. So, that works out. And trying to find ways to make this all work and fit together led iteratively towards me discovering that the CDK was really kind of awesome for a lot of this.

That said, there were definitely some fairly gnarly things I learned as I went through it, due in no small part to help I received from generous randos in the cdk.dev Slack team. And it’s gotten to a point where it’s working, and as an added bonus, I even mostly understand what he’s doing, which is just kind of wild to me.

Matt: It’s one of those interesting things where because it’s a programming language, you can use it out of the box the way it’s designed to be used where you can just write your simple logic which generates your CloudFormation, or you can do whatever crazy logic you want to do on top of that to make your app work the way you want it to work. And providing you’re not in a company like Liberty, where I’m going to do a code review, if no one’s stopping you, you can do your crazy experiments. And if you understand that, it’s good. But I do think something like the multi-region deploy, I mean, with CDK, if you’d have a construct, it takes in a variable that you can just say what the region is, so you can actually just write a for loop and pass it in, which does make things a lot easier than, I don’t know, try to do it with a YAML, which you can pass in parameters, but you’re going to get a lot more complicated a lot quicker.

Corey: The approach that I took philosophically was I wrote everything in a region-agnostic way. And it would be instantiated and be told what region to run it in as an environment variable that CDK deploy was called. And then I just deploy 20 simultaneous stacks through GitHub Actions, which invoke custom runners that runs inside of a Lambda function. And that’s just a relatively basic YAML file, thanks to the magic of GitHub Actions matrix jobs. So, it fires off 20 simultaneous processes and on every commit to the main branch, and then after about two-and-a-half minutes, it has been deployed globally everywhere and I get notified on anything that fails, which is always fun and exciting to learn those things.

That has been, overall, just a really useful experiment and an experience because you’re right, you could theoretically run this as a single CDK deploy and then wind up having an iterate through a list of regions. The challenge I have there is that unless I start getting into really convoluted asynchronous concurrency stuff, it feels like it’ll just take forever. At two-and-a-half minutes a region times 20 regions, that’s the better part of an hour on every deploy and no one’s got that kind of patience. So, I wound up just parallelizing it a bit further up the stack. That said, I bet they are relatively straightforward ways, given the async is a big part of JavaScript, to do this simultaneously.

Matt: One of the pieces of feedback I’ve seen about CDK is if you have multiple stacks in the same project, it’ll deploy them one at a time. And that’s just because it tries to understand the dependencies between the stacks and then it works out which one should go first. But a lot of people have said, “Well, I don’t want that. If I have 20 stacks, I want all 20 to go at once the way you’re saying.” And I have seen that people have been writing plugins to enable concurrent deploys with CDK out of the box. So, it may be something that it’s not an out-of-the-box feature, but it might be something that you can pull in a community plug-in to actually make work.

Corey: Most of my problems with it at this point are really problems with CloudFormation. CloudFormation does not support well, if at all, secure string parameters from the AWS Systems Manager parameter store, which is my default go-to for secret storage, and Secrets Manager is supported, but that also cost 40 cents a month per secret. And not for nothing, I don’t really want to have all five secrets deployed to Secrets Manager in every region this thing is in. I don’t really want to pay $20 a month for this basically free application, just to hold some secrets. So, I wound up talking to some folks in the Slack channel and what we came up with was, I have a centralized S3 bucket that has a JSON object that lives in there.

It’s only accessible from the deployment role, and it grabs that at deploy time and stuffs it into environment variables when it pushes these things out. That’s the only stateful part of all of this. And it felt like that is, on some level, a pattern that a lot of people would benefit from if it had better native support. But the counterargument that if you’re only deploying to one or two regions, then Secrets Manager is the right answer for a lot of this and it’s not that big of a deal.

Matt: Yeah. And it’s another one of those things, if you’re deploying in Liberty, we’ll say, “Well, your secret is unencrypted at runtime, so you probably need a KMS key involved in that,” which as you know, the costs of KMS, it depends on if it’s a personal solution or if it’s something for, like, a Fortune 100 company. And if it’s personal solution, I mean, what you’re saying sounds great that it’s IAM restricted in S3, and then that way only at deploy time can be read; it actually could be a custom construct that someone can build and publish out there to the construct library—or the construct hub, I should say.

Corey: To be clear, the reason I’m okay with this, from a security perspective is one, this is in a dedicated AWS account. This is the only thing that lives in that account. And two, the only API credentials we’re talking about are the application-specific credentials for this Twitter client when it winds up talking to the Twitter API. Basically, if you get access to these and are able to steal them and deploy somewhere else, you get no access to customer data, you get—or user data because this is not charge for anything—you get no access to things that have been sent out; all you get to do is submit tweets to Twitter and it’ll have the string ‘Last Tweet in AWS’ as your client, rather than whatever normal client you would use. It’s not exactly what we’d call a high-value target because all the sensitive to a user data lives in local storage in their browser. It is fully stateless.

Matt: Yeah, so this is what I mean. Like, it’s the difference in what you’re using your app for. Perfect case of, you can just go into the Twitter app and just withdraw those credentials and do it again if something happens, whereas as I say, if you’re building it for Liberty, that it will not pass a lot of our Well-Architected reviews, just for that reason.

Corey: If I were going to go and deploy this at a more, I guess, locked down environment, I would be tempted to find alternative approaches such as having it stored encrypted at rest via KMS in S3 is one option. So, is having global DynamoDB tables that wind up grabbing those things, even grabbing it at runtime if necessary. There are ways to make that credential more secure at rest. It’s just, I look at this from a real-world perspective of what is the actual attack surface on this, and I have a really hard time just identifying anything that is going to be meaningful with regard to an exploit. If you’re listening to this and have a lot of thoughts on that matter, please reach out I’m willing to learn and change my opinion on things.

Matt: One thing I will say about the Dynamo approach you mentioned, I’m not sure everybody knows this, but inside the same Dynamo table, you can scope down a row. You can be, like, “This row and this field in this row can only be accessed from this one Lambda function.” So, there’s a lot of really awesome security features inside DynamoDB that I don’t think most people take advantage of, but they open up a lot of options for simplicity.

Corey: Is that tied to the very recent announcement about Lambda getting SourceArn as a condition key? In other words, you can say, “This specific Lambda function,” as opposed to, “A Lambda in this account?” Like that was a relatively recent Advent that I haven’t fully explored the nuances of.

Matt: Yeah, like, that has opened a lot of doors. I mean, the Dynamo being able to be locked out in your row has been around for a while, but the new Lambda from SourceArn is awesome because, yeah, as you say, you can literally say this thing, as opposed to, you have to start going into tags, or you have to start going into something else to find it.

Corey: So, I want to talk about something you just alluded to, which is the Well-Architected Framework. And initially, when it launched, it was a whole framework, and AWS made a lot of noise about it on keynote stages, as they are want to do. And then later, they created a quote-unquote, “Well-Architected Tool,” which let’s be very direct, it’s the checkbox survey form, at least the last time I looked at it. And they now have the six pillars of the Well-Architected Framework where they talk about things like security, cost, sustainability is the new pillar, I don’t know, absorbency, or whatever the remainders are. I can’t think of them off the top of my head. How does that map to your experience with the CDK?

Matt: Yeah, so out of the box, the CDK from day one was designed to have sensible defaults. And that’s why a lot of the things you deploy have opinions. I talked to a couple of the Heroes and they were like, “I wish it had less opinions.” But that’s why whenever you deploy something, it’s got a bunch of configuration already in there. For me, in the CDK, whenever I use constructs, or stacks, or deploying anything in the CDK, I always build it in a well-architected way.

And that’s such a loaded sentence whenever you say the word ‘well-architected,’ that people go, “What do you mean?” And that’s where I go through the six pillars. And in Liberty, we have a process, it used to be called SCORP because it was five pillars, but not SCORPS [laugh] because they added sustainability. But that’s where for every stack, we’ll go through it and we’ll be like, “Okay, let’s have the discussion.” And we will use the tool that you mentioned, I mean, the tool, as you say, it’s a bunch of tick boxes with a text box, but the idea is we’ll get in a room and as we build the starter patterns or these pieces of infrastructure that people are going to reuse, we’ll run the well-architected review against the framework before anybody gets to generate it.

And then we can say, out of the box, if you generate this thing, these are the pros and cons against the Well-Architected Framework of what you’re getting. Because we can’t make it a hundred percent bulletproof for your use case because we don’t know it, but we can tell you out of the box, what it does. And then that way, you can keep building so they start off with something that is well documented how well architected it is, and then you can start having—it makes it a lot easier to have those conversations as they go forward. Because you just have to talk about the delta as they start adding their own code. Then you can and you go, “Okay, you’ve added these 20 lines. Let’s talk about what they do.” And that’s why I always think you can do a strong connection between infrastructure-as-code and well architected.

Corey: As I look through the actual six pillars of the Well-Architected Framework: sustainability, cost optimization, performance, efficiency, reliability, security, and operational excellence, as I think through the nature of what this shitpost thread Twitter client is, I am reasonably confident across all of those pillars. I mean, first off, when it comes to the cost optimization pillar, please, don’t come to my house and tell me how that works. Yeah, obnoxiously the security pillar is sort of the thing that winds up causing a problem for this because this is an account deployed by Control Tower. And when I was getting this all set up, my monthly cost for this thing was something like a dollar in charges and then another sixteen dollars for the AWS config rule evaluations on all of the deploys, which is… it just feels like a tax on going about your business, but fine, whatever. Cost and sustainability, from my perspective, also tend to be hand-in-glove when it comes to this stuff.

When no one is using the client, it is not taking up any compute resources, it has no carbon footprint of which to speak, by my understanding, it’s very hard to optimize this down further from a sustainability perspective without barging my way into the middle of an AWS negotiation with one of its power companies.

Matt: So, for everyone listening, watch as we do a live well-architected review because—

Corey: Oh yeah, I expect—

Matt: —this is what they are. [laugh].

Corey: You joke; we should do this on Twitter one of these days. I think would be a fantastic conversation. Or Twitch, or whatever the kids are using these days. Yeah.

Matt: Yeah.

Corey: And again, if so much of it, too, is thinking about the context. Security, you work for one of the world’s largest insurance companies. I shitpost for a living. The relative access and consequences of screwing up the security on this are nowhere near equivalent. And I think that’s something that often gets lost, per the perfect be the enemy of the good.

Matt: Yeah that’s why, unfortunately, the Well-Architected Tool is quite loose. So, that’s why they have the Well-Architected Framework, which is, there’s a white paper that just covers anything which is quite big, and then they wrote specific lenses for, like, serverless or other use cases that are shorter. And then when you do a well-architected review, it’s like loose on, sort of like, how are you applying the principles of well-architected. And the conversation that we just had about security, so you would write that down in the box and be, like, “Okay, so I understand if anybody gets this credential, it means they can post this Last Tweet in AWS, and that’s okay.”

Corey: The client, not the Twitter account, to be clear.

Matt: Yeah. So, that’s okay. That’s what you just mark down in the well-architected review. And then if we go to day one on the future, you can compare it and we can go, “Oh. Okay, so last time, you said this,” and you can go, “Well, actually, I decided to—” or you just keep it as a note.

Corey: “We pivoted. We’re a bank now.” Yeah.

Matt: [laugh]. So, that’s where—we do more than tweets now. We decided to do microtransactions through cryptocurrency over Twitter. I don’t know but if you—

Corey: And that ends this conversation. No no. [laugh].

Matt: [laugh]. But yeah, so if something changes, that’s what the well-architected reviews for. It’s about facilitating the conversation between the architect and the engineer. That’s all it is.

Corey: This episode is sponsored in part by our friend EnterpriseDB. EnterpriseDB has been powering enterprise applications with PostgreSQL for 15 years. And now EnterpriseDB has you covered wherever you deploy PostgreSQL on-premises, private cloud, and they just announced a fully-managed service on AWS and Azure called BigAnimal, all one word. Don’t leave managing your database to your cloud vendor because they’re too busy launching another half-dozen managed databases to focus on any one of them that they didn’t build themselves. Instead, work with the experts over at EnterpriseDB. They can save you time and money, they can even help you migrate legacy applications—including Oracle—to the cloud. To learn more, try BigAnimal for free. Go to biganimal.com/snark, and tell them Corey sent you.

Corey: And the lens is also helpful in that this is a serverless application. So, we’re going to view it through that lens, which is great because the original version of the Well-Architected Tool is, “Oh, you built this thing entirely in Lambda? Have you bought some reserved instances for it?” And it’s, yeah, why do I feel like I have to explain to AWS how their own systems work? This makes it a lot more streamlined and talks about this, though, it still does struggle with the concept of—in my case—a stateless app. That is still something that I think is not the common path. Imagine that: my code is also non-traditional. Who knew?

Matt: Who knew? The one thing that’s good about it, if anybody doesn’t know, they just updated the serverless lens about, I don’t know, a week or two ago. So, they added in a bunch of more use cases. So, if you’ve read it six months ago, or even three months ago, go back and reread it because they spent a good year updating it.

Corey: Thank you for telling me that. That will of course wind up in next week’s issue of Last Week in AWS. You can go back and look at the archives and figure out what week record of this then. Good work. One thing that I have learned as well as of yesterday, as it turns out, before we wound up having this recording—obviously because yesterday generally tends to come before today, that is a universal truism—is it I had to do a bit of refactoring.

Because what I learned when I was in New York live-tweeting the AWS Summit, is that the Route 53 latency record works based upon where your DNS server is. Yeah, that makes sense. I use Tailscale and wind up using my Pi-hole, which lives back in my house in San Francisco. Yeah, I was always getting us-west-1 from across the country. Cool.

For those weird edge cases like me—because this is not the common case—how do I force a local region? Ah, I’ll give it its own individual region prepend as a subdomain. Getting that to work with both the global lasttweetinaws.com domain as well as the subdomain on API Gateway through the CDK was not obvious on how to do it.

Randall Hunt over at Caylent was awfully generous and came up with a proof-of-concept in about three minutes because he’s Randall, and that was extraordinarily helpful. But a challenge I ran into was that the CDK deploy would fail because the way that CloudFormation was rendered in the way it was trying to do stuff, “Oh, that already has that domain affiliated in a different way.” I had to do a CDK destroy then a CDK deploy for each one. Now, not the end of the world, but it got me thinking, everything that I see around the CDK more or less distills down to either greenfield or a day one experience. That’s great, but throw it all away and start over is often not what you get to do.

And even though Amazon says it’s always day one, those of us in, you know, real companies don’t get to just treat everything as brand new and throw away everything older than 18 months. What is the day two experience looking like for you? Because you clearly have a legacy business. By legacy, I of course, use it in the condescending engineering term that means it makes actual money, rather than just telling really good stories to venture capitalists for 20 years.

Matt: Yeah. We still have mainframes running that make a lot of money. So, I don’t mock legacy at all.

Corey: “What’s that piece of crap do?” “Well, about $4 billion a year in revenue. Perhaps show some respect.” It’s a common refrain.

Matt: Yeah, exactly. So yeah, anyone listening, don’t mock legacy because as Corey says, it is running the business. But for us when it comes to day two, it’s something that I’m actually really passionate about this in general because it is really easy. Like I did it with CDK patterns, it’s really easy to come out and be like, “Okay, we’re going to create a bunch of starter patterns, or quickstarts”—or whatever flavor that you came up with—“And then you’re going to deploy this thing, and we’re going to have you in production and 30 seconds.” But even day one later that day—not even necessarily day two—it depends on who it was that deployed it and how long they’ve been using AWS.

So, you hear these stories of people who deployed something to experiment, and they either forget to delete, it cost them a lot of money or they tried to change it and it breaks because they didn’t understand what was in it. And this is where the community starts to diverge in their opinions on what AWS CDK should be. There’s a lot of people who think that at the minute CDK, even if you create an abstraction in a construct, even if I create a construct and put it in the construct library that you get to use, it still unravels and deploys as part of your deploy. So, everything that’s associated with it, you don’t own and you technically need to understand that at some point because it might, in theory, break. Whereas there’s a lot of people who think, “Okay, the CDK needs to go server side and an abstraction needs to stay an abstraction in the cloud. And then that way, if somebody is looking at a 20-line CDK construct or stack, then it stays 20 lines. It never unravels to something crazy underneath.”

I mean, that’s one pro tip thing. It’d be awesome if that could work. I’m not sure how the support for that would work from a—if you’ve got something running on the cloud, I’m pretty sure AWS [laugh] aren’t going to jump on a call to support some construct that I deployed, so I’m not sure how that will work in the open-source sense. But what we’re doing at Liberty is the other way. So, I mean, we famously have things like the software accelerator that lets you pick a pattern or create your pipelines and you’re deployed, but now what we’re doing is we’re building a lot of telemetry and automated information around what you deployed so that way—and it’s all based on Well-Architected, common theme. So, that way, what you can do is you can go into [crosstalk 00:26:07]—

Corey: It’s partially [unintelligible 00:26:07], and partially at a glance, figure out okay, are there some things that can be easily remediated as we basically shift that whole thing left?

Matt: Yeah, so if you deploy something, and it should be good the second you deploy it, but then you start making changes. Because you’re Corey, you just start adding some stuff and you deploy it. And if it’s really bad, it won’t deploy. Like, that’s the Liberty setup. There’s a bunch of rules that all go, “Okay, that’s really bad. That’ll cause damage to customers.”

But there’s a large gap between bad and good that people don’t really understand the difference that can cost a lot of money or can cause a lot of grief for developers because they go down the wrong path. So, that’s why what we’re now building is, after you deploy, there’s a dashboard that’ll just come up and be like, “Hey, we’ve noticed that your Lambda function has too little memory. It’s going to be slow. You’re going to have bad cold starts.” Or you know, things like that.

The knowledge that I have had the gain through hard fighting over the past couple of years putting it into automation, and that way, combined with the well-architected reviews, you actually get me sitting in a call going, “Okay, let’s talk about what you’re building,” that hopefully guides people the right way. But I still think there’s so much more we can do for day two because even if you deploy the best solution today, six months from now, AWS are releasing ten new services that make it easier to do what you just did. So, someone also needs to build something that shows you the delta to get to the best. And that would involve AWS or somebody thinking cohesively, like, these are how we use our products. And I don’t think there’s a market for it as a third-party company, unfortunately, but I do think that’s where we need to get to, that at day two somebody can give—the way we’re trying to do for Liberty—advice, automated that says, “I see what you’re doing, but it would be better if you did this instead.”

Corey: Yeah, I definitely want to spend more time thinking about these things and analyzing how we wind up addressing them and how we think about them going forward. I learned a lot of these lessons over a decade ago. I was fairly deep into using Puppet, and came to the fair and balanced conclusion that Puppet was a steaming piece of crap. So, the solution was that I was one of the very early developers behind SaltStack, which was going to do everything right. And it was and it was awesome and it was glorious, right up until I saw an environment deployed by someone else who was not as familiar with the tool as I was, at which point I realized hell is other people’s use cases.

And the way that they contextualize these things, you craft a finely balanced torque wrench, it’s a thing of beauty, and people complain about the crappy hammer. “You’re holding it wrong. No, don’t do it that way.” So, I have an awful lot of sympathy for people building platform-level tooling like this, where it works super well for the use case that they’re in, but not necessarily… they’re not necessarily aligned in other ways. It’s a very hard nut to crack.

Matt: Yeah. And like, even as you mentioned earlier, if you take one piece of AWS, for example, API Gateway—and I love the API Gateway team; if you’re listening, don’t hate on me—but there’s, like, 47,000 different ways you can deploy an API Gateway. And the CDK has to cover all of those, it would be a lot easier if there was less ways that you could deploy the thing and then you can start crafting user experiences on a platform. But whenever you start thinking that every AWS component is kind of the same, like think of the amount of ways you’re can deploy a Lambda function now, or think of the, like, containers. I’ll not even go into [laugh] the different ways to run containers.

If you’re building a platform, either you support it all and then it sort of gets quite generic-y, or you’re going to do, like, what serverless cloud are doing though, like Jeremy Daly is building this unique experience that’s like, “Okay, the code is going to build the infrastructure, so just build a website, and we’ll do it all behind it.” And I think they’re really interesting because they’re sort of opposites, in that one doesn’t want to support everything, but should theoretically, for their slice of customers, be awesome, and then the other ones, like, “Well, let’s see what you’re going to do. Let’s have a go at it and I should hopefully support it.”

Corey: I think that there’s so much that can be done on this. But before we wind up calling it an episode, I had one further question that I wanted to explore around the recent results of the community CDK survey that I believe is a quarterly event. And I read the analysis on this, and I talked about it briefly in the newsletter, but it talks about adoption and a few other aspects of it. And one of the big things it looks at is the number of people who are contributing to the CDK in an open-source context. Am I just thinking about this the wrong way when I think that, well, this is a tool that helps me build out cloud infrastructure; me having to contribute code to this thing at all is something of a bug, whereas yeah, I want this thing to work out super well—Docker is open-source, but you’ll never see me contributing things to Docker ever, as a pull request, because it does, as it says on the tin; I don’t have any problems that I’m aware of that, ooh, it should do this instead. I mean, I have opinions on that, but those aren’t pull requests; those are complete, you know, shifts in product strategy, which it turns out is not quite done on GitHub.

Matt: So, it’s funny I, a while ago, was talking to a lad who was the person who came up with the idea for the CDK. And CDK is pretty much the open-source project for AWS if you look at what they have. And the thought behind it, it’s meant to evolve into what people want and need. So yes, there is a product manager in AWS, and there’s a team fully dedicated to building it, but the ultimate aspiration was always it should be bigger than AWS and it should be community-driven. Now personally, I’m not sure—like you just said it—what the incentive is, given that right now CDK only works with CloudFormation, which means that you are directly helping with an AWS tool, but it does give me hope for, like, their CDK for Terraform, and their CDK for Kubernetes, and there’s other flavors based on the same technology as AWS CDK that potentially could have a thriving open-source community because they work across all the clouds. So, it might make more sense for people to jump in there.

Corey: Yeah, I don’t necessarily think that there’s a strong value proposition as it stands today for the idea of the CDK becoming something that works across other cloud providers. I know it technically has the capability, but if I think that Python isn’t quite a first-class experience, I don’t even want to imagine what other providers are going to look like from that particular context.

Matt: Yeah, and that’s from what I understand, I haven’t personally jumped into the CDK for Terraform and we didn’t talk about it here, but in CDK, you get your different levels of construct. And is, like, a CloudFormation-level construct, so everything that’s in there directly maps to a property in CloudFormation, and then L2 is AWS’s opinion on safe defaults, and then L3 is when someone like me comes along and turns it into something that you may find useful. So, it’s a pattern. As far as I know, CDK for Terraform is still on L1. They haven’t got the rich collection—

Corey: And L4 is just hiring you as a consultant—

Matt: [laugh].

Corey: —to come in fix my nonsense for me?

Matt: [laugh]. That’s it. L4 could be Pulumi recently announced that you can use AWS CDK constructs inside it. But I think it’s one of those things where the constructs, if they can move across these different tools the way AWS CDK constructs now work inside Pulumi, and there’s a beta version that works inside CDK for Terraform, then it may or may not make sense for people to contribute to this stuff because we’re not building at a higher level. It’s just the vision is hard for most people to get clear in their head because it needs articulated and told as a clear strategy.

And then, you know, as you said, it is an AWS product strategy, so I’m not sure what you get back by contributing to the project, other than, like, Thorsten—I should say, so Thorsten who wrote the book with me, he is the number three contributor, I think, to the CDK. And that’s just because he is such a big user of it that if he sees something that annoys him, he just comes in and tries to fix it. So, the benefit is, he gets to use the tool. But he is a super user, so I’m not sure, outside of super users, what the use case is.

Corey: I really want to thank you for, I want to say spending as much time talking to me about this stuff as you have, but that doesn’t really go far enough. Because so much of how I think about this invariably winds up linking back to things that you have done and have been advocating for in that community for such a long time. If it’s not you personally, just, like, your fingerprints are all over this thing. So, it’s one of those areas where the entire software developer ecosystem is really built on the shoulders of others who have done a lot of work that came before. Often you don’t get any visibility of who those people are, so it’s interesting whenever I get to talk to someone whose work I have directly built upon that I get to say thank you. So, thank you for this. I really do appreciate how much more straightforward a lot of this is than my previous approach of clicking in the console and then lying about it to provision infrastructure.

Matt: Oh, no worries. Thank you for the thank you. I mean, at the end of the day, all of this stuff is just—it helps me as much as it helps everybody else, and we’re all trying to do make everything quicker for ourselves, at the end of the day.

Corey: If people want to learn more about what you’re up to, where’s the best place to find you these days? They can always take a job at Liberty; I hear good things about it.

Matt: Yeah, we’re always looking for people at Liberty, so come look up our careers. But Twitter is always the best place. So, I’m @NIDeveloper on Twitter. You should find me pretty quickly, or just type Matt Coulter into Google, you’ll get me.

Corey: I like it. It’s always good when it’s like, “Oh, I’m the top Google result for my own name.” On some level, that becomes an interesting thing. Some folks into it super well, John Smith has some challenges, but you know, most people are somewhere in the middle of that.

Matt: I didn’t used to be number one, but there’s a guy called the Kangaroo Kid in Australia, who is, like, a stunt driver, who was number one, and [laugh] I always thought it was funny if people googled and got him and thought it was me. So, it’s not anymore.

Corey: Thank you again for, I guess, all that you do. And of course, taking the time to suffer my slings and arrows as I continue to revise my opinion of the CDK upward.

Matt: No worries. Thank you for having me.

Corey: Matt Coulter, senior architect at Liberty Mutual. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and leave an angry comment as well that will not actually work because it has to be transpiled through a JavaScript engine first.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Understanding CDK and The Well Architected Framework with Matt Coulter

Episode Summary

Episode Show Notes & Transcript

You might also like

Reliable Software by Default with Jeremy Edberg

See Why GenAI Workloads Are Breaking Observability with Wayne Segar

Presenting at re:Invent with Matt Berk and Bowen Wang

Get the Newsletter

Sponsor an Episode