Kubernetes and OpenGitOps with Chris Short

Episode Summary

Today Corey sits down with Chris Short, a senior Developer Advocate at AWS. They begin by commiserating on the process of writing and releasing their respective newsletters, and then they discuss EKS, billing, and some of AWS’s open source projects. Chris goes into detail about the new project he has co-chaired, OpenGitOps. Corey and Chris talk about GitOps and configuration management, and conclude their time with a discussion about connectivity and Tailscale.

Episode Show Notes & Transcript

About Chris

Chris Short has been a proponent of open source solutions throughout his over two decades in various IT disciplines, including systems, security, networks, DevOps management, and cloud native advocacy across the public and private sectors. He currently works on the Kubernetes team at Amazon Web Services and is an active Kubernetes contributor and Co-chair of OpenGitOps. Chris is a disabled US Air Force veteran living with his wife and son in Greater Metro Detroit. Chris writes about Cloud Native, DevOps, and other topics at ChrisShort.net. He also runs the Cloud Native, DevOps, GitOps, Open Source, industry news, and culture focused newsletter DevOps’ish.


Links Referenced:

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Coming back to us since episode two—it’s always nice to go back and see the where are they now type of approach—I am joined by Senior Developer Advocate at AWS Chris Short. Chris, been a few years. How has it been?

Chris: Ha. Corey, we have talked outside of the podcast. But it’s been good. For those that have been listening, I think when we recorded I wasn’t even—like, when was season two, what year was that? [laugh].

Corey: Episode two was first pre-pandemic and the rest. I believe—

Chris: Oh. So, yeah. I was at Red Hat, maybe, when I—yeah.

Corey: Yeah. You were doing Red Hat stuff, back when you got to work on open-source stuff, as opposed to now, where you’re not within 1000 miles of that stuff, right?

Chris: Actually well, no. So, to be clear, I’m on the EKS team, the Kubernetes team here at AWS. So, when I joined AWS in October, they were like, “Hey, you do open-source stuff. We like that. Do more.” And I was like, “Oh, wait, do more?” And they were like, “Yes, do more.” “Okay.”

So, since joining AWS, I’ve probably done more open-source work than the three years at Red Hat that I did. So, that’s kind of—you know, like, it’s an interesting point when I talk to people about it because the first couple months are, like—you know, my friends are like, “So, are you liking it? Are you enjoying it? What’s going on?” And—

Corey: Do they beat you with reeds? Like, all the questions people have about companies? Because—

Chris: Right. Like, I get a lot of random questions about Amazon and AWS that I don’t know the answer to.

Corey: Oh, when I started telling people, I fixed Amazon bills, I had to quickly pivot that to AWS bills because people started asking me, “Well, can you save me money on underpants?” It’s I—

Chris: Yeah.

Corey: How do you—fine. Get the prime credit card. It docks 5% off the bill, so there you go. But other than that, no, I can’t.

Chris: No.

Corey: It’s—

Chris: Like, I had to call my bank this morning about a transaction that I didn’t recognize, and it was from Amazon. And I was like, that’s weird. Why would that—

Corey: Money just flows one direction, and that’s the wrong direction from my employer.

Chris: Yeah. Like, what is going on here? It shouldn’t have been on that card kind of thing. And I had to explain to the person on the phone that I do work at Amazon but under the Web Services team. And he was like, “Oh, so you’re in IT?”

And I’m like, “No.” [laugh]. “It’s actually this big company. That—it’s a cloud company.” And they’re like, “Oh, okay, okay. Yeah. The cloud. Got it.” [laugh]. So, it’s interesting talking to people about, “I work at Amazon.” “Oh, my son works at Amazon distribution center,” blah, blah, blah. It’s like, cool. “I know about that, but very little. I do this.”

Corey: Your son works in Amazon distribution center. Is he a robot? Is normally my next question on that? Yeah. That’s neither here nor there.

So, you and I started talking a while back. We both write newsletters that go to a somewhat similar audience. You write DevOps’ish. I write Last Week in AWS. And recently, you also have started EKS News because, yeah, the one thing I look at when I’m doing these newsletters every week is, you know what I want to do? That’s right. Write more newsletters.

Chris: [laugh].

Corey: So, you are just a glutton for punishment? And, yeah, welcome to the addiction, I suppose. How’s it been going for you?

Chris: It’s actually been pretty interesting, right? Like, we haven’t pushed it very hard. We’re now starting to include it in things. Like we did Container Day; we made sure that EKS news was on the landing page for Container Day at KubeCon EU. And you know, it’s kind of just grown organically since then.

But it was one of those things where it’s like, internally—this happened at Red Hat, right—when I started live streaming at Red Hat, the ultimate goal was to do our product management—like, here’s what’s new in the next version thing—do those live so anybody can see that at any point in time anywhere on Earth, the second it’s available. Similar situation to here. This newsletter actually is generated as part of a report my boss puts together to brief our other DAs—or developer advocates—you know, our solutions architects, the whole nine yards about new EKS features. So, I was like, why can’t we just flip that into a weekly newsletter, you know? Like, I can pull from the same sources you can.

And what’s interesting is, he only does the meeting bi-weekly. So, there’s some weeks where it’s just all me doing it and he ends up just kind of copying and pasting the newsletter into his document, [laugh] and then adds on for the week. But that report meeting for that team is now getting disseminated to essentially anyone that subscribes to eks.news. Just go to the site, there’s a subscribe thing right there. And we’ve gotten 20 issues in and it’s gotten rave reviews, right?

Corey: I have been a subscriber for a while. I will say that it has less Chris Short personality—

Chris: Mm-hm.

Corey: —to it than DevOps’ish does, which I have to assume is by design. A lot of The Duckbill Group’s marketing these days is no longer in my voice, rather intentionally, because it turns out that being a sarcastic jackass and doing half-billion dollar AWS contracts can not to be the most congruent thing in the world. So okay, we’re slowly ameliorating that. It’s professional voice versus snarky voice.

Chris: Well, and here’s the thing, right? Like, I realized this year with DevOps’ish that, like, if I want to take a week off, I have to do, like, what you did when your child was born. You hired folks to like, do the newsletter for you, or I actually don’t do the newsletter, right? It’s binary: hire someone else to do it, or don’t do it. So, the way I structured this newsletter was that any developer advocate on my team could jump in and take over the newsletter so that, you know, if I’m off that week, or whatever may be happening, I, Chris Short, am not the voice. It is now the entire developer advocate team.

Corey: I will challenge you on that a bit. Because it’s not Chris Short voice, that’s for sure, but it’s also not official AWS brand voice either.

Chris: No.

Corey: It is clearly written by a human being who is used to communicating with the audience for whom it is written. And that is no small thing. Normally, when oh, there’s a corporate newsletter; that’s just a lot of words to say it’s bad. This one is good. I want to be very clear on that.

Chris: Yeah, I mean, we have just, like, DevOps’ish, we have sections, just like your newsletter, there’s certain sections, so any new, what’s new announcements, those go in automatically. So, like, that can get delivered to your inbox every Friday. Same thing with new blog posts about anything containers related to EKS, those will be in there, then Containers from the Couch, our streaming platform, essentially, for all things Kubernetes. Those videos go in.

And then there’s some ecosystem news as well that I collect and put in the newsletter to give people a broader sense of what’s going on out there in Kubernetes-land because let’s face it, there’s upstream and then there’s downstream, and sometimes those aren’t in sync, and that’s normal. That’s how Kubernetes kind of works sometimes. If you’re running upstream Kubernetes, you are awesome. I appreciate you, but I feel like that would cause more problems and it’s worse sometimes.

Corey: Thank you for being the trailblazers. The rest of us can learn from your misfortune.

Chris: [laugh]. Yeah, exactly. Right? Like, please file your bugs accordingly. [laugh].

Corey: EKS is interesting to me because I don’t see a lot of it, which is, probably, going to get a whole lot of, “Wait, what?” Moments because wait, don’t you deal with very large AWS bills? And I do. But what I mean by that is that EKS, until you’re using its Fargate expression, charges for the control plane, which rounds to no money, and the rest is running on EC2 instances running in a company’s account. From the billing perspective, there is no difference between, “We’re running massive fleets of EKS nodes.” And, “We’re managing a whole bunch of EC2 instances by hand.”

And that feels like an interesting allegory for how Kubernetes winds up expressing itself to cloud providers. Because from a billing perspective, it just looks like one big single-tenant application that has some really strange behaviors internally. It gets very chatty across AZs when there’s no reason to, and whatnot. And it becomes a very interesting study in how to expose aspects of what’s going on inside of those containers and inside of the Kubernetes environment to the cloud provider in a way that becomes actionable. There are no good answers for this yet, but it’s something I’ve been seeing a lot of. Like, “Oh, I thought you’d be running Kubernetes. Oh, wait, you are and I just keep forgetting what I’m looking at sometimes.”

Chris: So, that’s an interesting point. The billing is kind of like, yeah, it’s just compute, right? So—

Corey: And my insight into AWS and the way I start thinking about it is always from a billing perspective. That’s great. It’s because that means the more expensive the services, the more I know about it. It’s like, “IAM. What is that?” Like, “Oh, I have no idea. It’s free. How important could it be?” Professional advice: do not take that philosophy, ever.

Chris: [laugh]. No. Ever. No.

Corey: Security: it matters. Oh, my God. It’s like you’re all stars. Your IAM policy should not be. I digress.

Chris: Right. Yeah. Anyways, so two points I want to make real quick on that is, one, we’ve recently released an open-source project called Carpenter, which is really cool in my purview because it looks at your Kubernetes file and says, “Oh, you want this to run on ARM instance.” And you can even go so far as to say, right, here’s my limits, and it’ll find an instance that fits those limits and add that to your cluster automatically. Run your pod on that compute as long as it needs to run and then if it’s done, it’ll downsize—eventually, kind of thing—your cluster.

So, you can basically just throw a bunch of workloads at it, and it’ll auto-detect what kind of compute you will need and then provision it for you, run it, and then be done. So, that is one-way folks are probably starting to save money running EKS is to adopt Carpenter as your autoscaler as opposed to the inbuilt Kubernetes autoscaler. Because this is instance-aware, essentially, so it can say, like, “Oh, your massive ARM application can run here,” because you know, thank you, Graviton. We have those processors in-house. And you know, you can run your ARM64 instances, you can run all the Intel workloads you want, and it’ll right size the compute for your workloads.

And I’ll look at one container or all your containers, however you want to configure it. Secondly, the good folks over at Kubecost have opencost, which is the open-source version of Kubecost, basically. So, they have a service that you can run in your clusters that will help you say, “Hey, maybe this one notes too heavy; maybe this one notes too light,” and you know, give you some insights into Kubernetes spend that are a little bit more granular as far as usage and things like that go. So, those two projects right there, I feel like, will give folks an optimal savings experience when it comes to Kubernetes. But to your point, it’s just compute, right? And that’s really how we treat it, kind of, here internally is that it’s a way to run… compute, Kubernetes, or ECS, or any of those tools.

Corey: A fairly expensive one because ignoring entirely for a second the actual raw cost of compute, you also have the other side of it, which is in every environment, unless you are doing something very strange or pre-funding as a one-person startup in your spare time, your payroll costs will it—should—exceed your AWS bill by a fairly healthy amount. And engineering time is always more expensive than services time. So, for example, looking at EKS, I would absolutely recommend people use that rather than rolling their own because—

Chris: Rolling their own? Yeah.

Corey: —get out of that engineering space where your time is free. I assure you from a business context, it is not. So, there’s always that question of what you can do to make things easier for people and do more of the heavy lifting.

Chris: Yeah, and to your rather cheeky point that there’s 17 ways to run a container on AWS, it is answering that question, right? Like those 17 ways, like, how much of this do you want to run yourself, you could run EKS distro on EC2 instances if you want full control over your environment.

Corey: And then run IoT Greengrass core on top within that cluster—

Chris: Right.

Corey: So, I can run my own Lambda function runtime, so I’m not locked in. Also, DynamoDB local so I’m not locked into AWS. At which point I have gone so far around the bend, no one can help me.

Chris: Well—

Corey: Pro tip, don’t do that. Just don’t do that.

Chris: But to your point, we have all these options for compute, and specifically containers because there’s a lot of people that want to granularly say, “This is where my engineering team gets involved. Everything else you handle.” If I want EKS on Spot Instances only, you can do that. If you want EKS to use Carpenter and say only run ARM workloads, you can do that. If you want to say Fargate and not have anything to manage other than the container file, you can do that.

It’s how much does your team want to manage? That’s the customer obsession part of AWS coming through when it comes to containers is because there’s so many different ways to run those workloads, but there’s so many different ways to make sure that your team is right-sized, based off the services you’re using.

Corey: I do want to change gears a bit here because you are mostly known for a couple of things: the DevOps’ish newsletter because that is the oldest and longest thing you’ve been doing the time that I’ve known you; EKS, obviously. But when prepping for this show, I discovered you are now co-chair of the OpenGitOps project.

Chris: Yes.

Corey: So, I have heard of GitOps in the context of, “Oh, it’s just basically your CI/CD stuff is triggered by Git events and whatnot.” And I’m sitting here going, “Okay, so from where you’re sitting, the two best user interfaces in the world that you have discovered are YAML and Git.” And I just have to start with the question, “Who hurt you?”

Chris: [laugh]. Yeah, I share your sentiment when it comes to Git. Not so much with YAML, but I think it’s because I’m so used to it. Maybe it’s Stockholm Syndrome, maybe the whole YAML thing. I don’t know.

Corey: Well, it’s no XML. We’ll put it that way.

Chris: Thankfully, yes because if it was, I would have way more, like, just template files laying around to build things. But the—

Corey: And rage. Don’t forget rage.

Chris: And rage, yeah. So, GitOps is a little bit more than just Git in IaC—infrastructure as Code. It’s more like Justin Garrison, who’s also on my team, he calls it infrastructure software because there’s four main principles to GitOps, and if you go to opengitops.dev, you can see them. It’s version one.

So, we put them on the website, right there on the page. You have to have a declared state and that state has to live somewhere. Now, it’s called GitOps because Git is probably the most full-featured thing to put your state in, but you could use an S3 bucket and just version it, for example. And make it private so no one else can get to it.

Corey: Or you could use local files: copy-of-copy-of-this-thing-restored-parentheses-use-this-one-dot-final-dot-doc-dot-zip. You know, my preferred naming convention.

Chris: Ah, yeah. Wow. Okay. [laugh]. Yeah.

Corey: Everything I touch is terrifying.

Chris: Yes. Geez, I’m sorry. So first, it’s declarative. You declare your state. You store it somewhere. It’s versioned and immutable, like I said. And then pulled automatically—don’t focus so much on pull—but basically, software agents are applying the desired state from source. So, what does that mean? When it’s—you know, the fourth principle is implemented, continuously reconciled. That means those software agents that are checking your desired state are actually putting it back into the desired state if it’s out of whack, right? So—

Corey: You’re talking about agents running it persistently on instances, validating—

Chris: Yes.

Corey: —a checkpoint on a cron. How is this meaningfully different than a Puppet agent running in years past? Having spent I learned to speak publicly by being a traveling trainer for Puppet; same type of model, and in fact, when I was at Pinterest, we wound up having a fair bit—like, that was their entire model, where they would have—the Puppet’s code would live in an S3 bucket that was then copied down, I believe, via Git, and then applied to the instance on a schedule. Like, that sounds like this was sort of a early days GitOps.

Chris: Yeah, exactly. Right? Like so it’s, I like to think of that as a component of GitOps, right? DevOps, when you talk about DevOps in general, there’s a lot of stuff out there. There’s a lot of things labeled DevOps that maybe are, or maybe aren’t sticking to some of those DevOps core things that make you great.

Like the stuff that Nicole Forsgren writes about in books, you know? Accelerate is on my desk for a reason because there’s things that good, well-managed DevOps practices do. I see GitOps as an actual implementation of DevOps in an open-source manner because all the tooling for GitOps these days is open-source and it all started as open-source. Now, you can get, like, Flux or Argo—Argo, specifically—there’s managed services out there for it, you can have Flux and not maintain it, through an add-on, on EKS for example, and it will reconcile that state for you automatically. And the other thing I like to say about GitOps, specifically, is that it moves at the speed of the Kubernetes Audit Log.

If you’ve ever looked at a Kubernetes audit log, you know it’s rather noisy with all these groups and versions and kinds getting thrown out there. So, GitOps will say, “Oh, there’s an event for said thing that I’m supposed to be watching. Do I need to change anything? Yes or no? Yes? Okay, go.”

And the change gets applied, or, “Hey, there’s a new Git thing. Pull it in. A change has happened inGit I need to update it.” You can set it to reconcile on events on time. It’s like a cron or it’s like an event-driven architecture, but it’s combined.

Corey: How does it survive the stake through the heart of configuration management? Because before I was doing all this, I wasn’t even a T-shaped engineer: you’re broad across a bunch of things, but deep in one or two areas, and one of mine was configuration management. I wrote part of SaltStack, once upon a time—

Chris: Oh.

Corey: —due to a bunch of very strange coincidences all hitting it once, like, I taught people how to use Puppet. But containers ultimately arose and the idea of immutable infrastructure became a thing. And these days when we were doing full-on serverless, well, great, I just wind up deploying a new code bundle to the Lambdas function that I wind up caring about, and that is a immutable version replacement. There is no drift because there is no way to log in and change those things other than through a clear deployment of this as the new version that goes out there. Where does GitOps fit into that imagined pattern?

Chris: So, configuration management becomes part of your approval process, right? So, you now are generating an audit log, essentially, of all changes to your system through the approval process that you set up as part of your, how you get things into source and then promote that out to production. That’s kind of the beauty of it, right? Like, that’s why we suggest using Git because it has functions, like, requests and issues and things like that you can say, “Hey, yes, I approve this,” or, “Hey, no, I don’t approve that. We need changes.” So, that’s kind of natively happening with Git and, you know, GitLab, GitHub, whatever implementation of Git. There’s always, kind of—

Corey: Uh, JIF-ub is, I believe, the pronunciation.

Chris: JIF-ub? Oh.

Corey: Yeah. That’s what I’m—

Chris: Today, I learned. Okay.

Corey: Exactly. And that’s one of the things that I do for my lasttweetinaws.com Twitter client that I build—because I needed it, and if other people want to use it, that’s great—that is now deployed to 20 different AWS commercial regions, simultaneously. And that is done via—because it turns out that that’s a very long to execute for loop if you start down that path—

Chris: Well, yeah.

Corey: I wound up building out a GitHub Actions matrix—sorry a JIF-ub—actions matrix job that winds up instantiating 20 parallel builds of the CDK deploy that goes out to each region as expected. And because that gets really expensive with native GitHub Actions runners for, like, 36 cents per deploy, and I don’t know how to test my own code, so every time I have a typo, that’s another quarter in the jar. Cool, but that was annoying for me so I built my own custom runner system that uses Lambda functions as runners running containers pulled from ECR that, oh, it just runs in parallel, less than three minutes. Every time I commit something between I press the push button and it is out and running in the wild across all regions. Which is awesome and also terrifying because, as previously mentioned, I don’t know how to test my code.

Chris: Yeah. So, you don’t know what you’re deploying to 20 regions sometime, right?

Corey: But it also means I have a pristine, re-composable build environment because I can—

Chris: Right.

Corey: Just automatically have that go out and the fact that I am making a—either merging a pull request or doing a direct push because I consider main to be my feature branch as whenever something hits that, all the automation kicks off. That was something that I found to be transformative as far as a way of thinking about this because I was very tired of having to tweak my local laptop environment to, “Oh, you didn’t assume the proper role and everything failed again and you broke it. Good job.” It wound up being something where I could start developing on more and more disparate platforms. And it finally is what got me away from my old development model of everything I build is on an EC2 instance, and that means that my editor of choice was Vim. I use the VS Code now for these things, and I’m pretty happy with it.

Chris: Yeah. So, you know, I’m glad you brought up CDK. CDK gives you a lot of the capabilities to implement GitOps in a way that you could say, like, “Hey, use CDK to declare I need four Amazon EKS clusters with this size, shape, and configuration. Go.” Or even further, connect to these EKS clusters to RDS instances and load balancers and everything else.

But you put that state into Git and then you have something that deploys that automatically upon changes. That is infrastructure as code. Now, when you say, “Okay, main is your feature branch,” you know, things happen on main, if this were running in Kubernetes across a fleet of clusters or the globe-wide in 20 regions, something like Flux or Argo would kick in and say, “There’s been a change to source, main, and we need to roll this out.” And it’ll start applying those changes. Now, what do you get with GitOps that you don’t get with your configuration?

I mean, can you rollback if you ever have, like, a bad commit that’s just awful? I mean, that’s really part of the process with GitOps is to make sure that you can, A, roll back to the previous good state, B, roll forward to a known good state, or C, promote that state up through various environments. And then having that all done declaratively, automatically, and immutably, and versioned with an audit log, that I think is the real power of GitOps in the sense that, like, oh, so-and-so approve this change to security policy XYZ on this date at this time. And that to an auditor, you just hand them a log file on, like, “Here’s everything we’ve ever done to our system. Done.” Right?

Like, you could get to that state, if you want to, which I think is kind of the idea of DevOps, which says, “Take all these disparate tools and processes and procedures and culture changes”—culture being the hardest part to adopt in DevOps; GitOps kind of forces a culture change where, like, you can’t do a CAB with GitOps. Like, those two things don’t fly. You don’t have a configuration management database unless you absolutely—

Corey: Oh, you CAB now but they’re all the comments of the pull request.

Chris: Right. Exactly. Like, don’t push this change out until Thursday after this other thing has happened, kind of thing. Yeah, like, that all happens in GitHub. But it’s very democratizing in the sense that people don’t have to waste time in an hour-long meeting to get their five minutes in, right?

Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.

Corey: So, would it be overwhelmingly cynical to suggest that GitOps is the means to implement what we’ve all been pretending to have implemented for the last decade when giving talks at conferences?

Chris: Ehh, I wouldn’t go that far. I would say that GitOps is an excellent way to implement the things you’ve been talking about at all these conferences for all these years. But keep in mind, the technology has changed a lot in the, what 11, 12 years of the existence of DevOps, now. I mean, we’ve gone from, let’s try to manage whole servers immutably to, “Oh, now we just need to maintain an orchestration platform and run containers.” That whole compute interface, you go from SSH to a Docker file, that’s a big leap, right?

Like, you don’t have bespoke sysadmins; you have, like, a platform team. You don’t have DevOps engineers; they’re part of that platform team, or DevOps teams, right? Like, which was kind of antithetical to the whole idea of DevOps to have a DevOps team. You know, everybody’s kind of in the same boat now, where we see skill sets kind of changing. And GitOps and Kubernetes-land is, like, a platform team that manages the cluster, and its state, and health and, you know, production essentially.

And then you have your developers deploying what they want to deploy in when whatever namespace they’ve been given access to and whatever rights they have. So, now you have the potential for one set of people—the platform team—to use one set of GitOps tooling, and your applications teams might not like that, and that’s fine. They can have their own namespaces with their own tooling in it. Like, Argo, for example, is preferred by a lot of developers because it has a nice UI with green and red dots and they can show people and it looks nice, Flux, it’s command line based. And there are some projects out there that kind of take the UI of Argo and try to run Flux underneath that, and those are cool kind of projects, I think, in my mind, but in general, right, I think GitOps gives you the choice that we missed somewhat in DevOps implementations of the past because it was, “Oh, we need to go get cloud.” “Well, you can only use this cloud.” “Oh, we need to go get this thing.” “Well, you can only use this thing in-house.”

And you know, there’s a lot of restrictions sometimes placed on what you can use in your environment. Well, if your environment is Kubernetes, how do you restrict what you can run, right? Like you can’t have an easily configured say, no open-source policy if you’re running Kubernetes. [laugh] so it becomes, you know—

Corey: Well, that doesn’t stop some companies from trying.

Chris: Yeah, that’s true. But the idea of, like, enabling your developers to deploy at will and then promote their changes as they see fit is really the dream of DevOps, right? Like, same with production and platform teams, right? I want to push my changes out to a larger system that is across the globe. How do I do that? How do I manage that? How do I make sure everything’s consistent?

GitOps gives you those ways, with Kubernetes native things like customizations, to make consistent environments that are robust and actually going to be reconciled automatically if someone breaks the glass and says, “Oh, I need to run this container immediately.” Well, that’s going to create problems because it’s deviated from state and it’s just that one region, so we’ll put it back into state.

Corey: It’ll be dueling banjos, at some point. You’ll try and doing something manually, it gets reverted automatically. I love that pattern. You’ll get bored before the computer does, always.

Chris: Yeah. And GitOps is very new, right? When you think about the lifetime of GitOps, I think it was coined in, like, 2018. So, it’s only four years old, right? When—

Corey: I prefer it to ChatOps, at least, as far as—

Chris: Well, I mean—

Corey: —implementation and expression of the thing.

Chris: —ChatOps was a way to do DevOps. I think GitOps—

Corey: Well, ChatOps is also a way to wind up giving whoever gets access to your Slack workspace root in production.

Chris: Mmm.

Corey: But that’s neither here nor there.

Chris: Mm-hm.

Corey: It’s yeah, we all like to pretend that’s not a giant security issue in our industry, but that’s a topic for another time.

Chris: Yeah. And that’s why, like, GitOps also depends upon you having good security, you know, and good authorization and approval processes. It enforces that upon—

Corey: Yeah, who doesn’t have one of those?

Chris: Yeah. If it’s a sole operation kind of deal, like in your setup, your case, I think you kind of got it doing right, right? Like, as far as GitOps goes—

Corey: Oh, to be clear, we are 11 people and we do have dueling pull requests and all the rest.

Chris: Right, right, right.

Corey: But most of the stuff I talk about publicly is not our production stuff, so it really is just me. Just as a point of clarity there. I’ve n—the 11 people here do not all—the rest of you don’t just sit there and clap as I do all the work.

Chris: Right.

Corey: Most days.

Chris: No, I’m sure they don’t. I’m almost certain they don’t clap… for you. I mean, they would—

Corey: No. No, they try and talk me out of it in almost every case.

Chris: Yeah, exactly. So, the setup that you, Corey Quinn, have implemented to deploy these 20 regions is kind of very GitOps-y, in the sense that when main changes, it gets updated. Where it’s not GitOps-y is what if the endpoint changes? Does it get reconciled? That’s the piece you’re probably missing is that continuous reconciliation component, where it’s constantly checking and saying, “This thing out there is deployed in the way I want it. You know, the way I declared it to be in my source of truth.”

Corey: Yeah, when you start having other people getting involved, there can—yeah, that’s where regressions enter. And it’s like, “Well, I know where things are so why would I change the endpoint?” Yeah, it turns out, not everyone has the state of the entire application in their head. Ideally it should live in—

Chris: Yeah. Right. And, you know—

Corey: —you know, Git or S3.

Chris: —when I—yeah, exactly. When I think about interactions of the past coming out as a new DevOps engineer to work with developers, it’s always been, will developers have access to prod or they don’t? And if you’re in that environment with—you’re trying to run a multi-billion dollar operation, and your devs have direct—or one Dev has direct access to prod because prod is in his brain, that’s where it’s like, well, now wait a minute. Prod doesn’t have to be only in your brain. You can put that in the codebase and now we know what is in your brain, right?

Like, you can almost do—if you document your code, well, you can have your full lifecycle right there in one place, including documentation, which I think is the best part, too. So, you know, it encourages approval processes and automation over this one person has an entire state of the system in their head; they have to go in and fix it. And what if they’re not on call, or in Jamaica, or on a cruise ship somewhere kind of thing? Things get difficult. Like, for example, I just got back from vacation. We were so far off the grid, we had satellite internet. And let me tell you, it was hard to write an email newsletter where I usually open 50 to 100 tabs.

Corey: There’s a little bit of internet out Californ-ie way.

Chris: [laugh].

Corey: Yeah it’s… it’s always weird going from, like, especially after pandemic; I have gigabit symmetric here and going even to re:Invent where I’m trying to upload a bunch of video and whatnot.

Chris: Yeah. Oh wow.

Corey: And the conference WiFi was doing its thing, and well, Verizon 5G was there but spotty. And well, yeah. Usual stuff.

Chris: Yeah. It’s amazing to me how connectivity has become so ubiquitous.

Corey: To the point where when it’s not there anymore, it’s what do I do with myself? Same story about people pushing back against remote development of, “Oh, I’m just going to do it all on my laptop because what happens if I’m on a plane?” It’s, yeah, the year before the pandemic, I flew 140,000 miles domestically and I was almost never hamstrung by my ability to do work. And my only local computer is an iPad for those things. So, it turns out that is less of a real world concern for most folks.

Chris: Yeah I actually ordered the components to upgrade an old Nook that I have here and turn it into my, like, this is my remote code server, that’s going to be all attached to GitHub and everything else. That’s where I want to be: have Tailscale and just VPN into this box.

Corey: Tailscale is transformative.

Chris: Yes. Tailscale will change your life. That’s just my personal opinion.

Corey: Yep.

Chris: That’s not an AWS opinion or anything. But yeah, when you start thinking about your network as it could be anywhere, that’s where Tailscale, like, really shines. So—

Corey: Tailscale makes the internet work like we all wanted to believe that it worked.

Chris: Yeah. And Wireguard is an excellent open-source project. And Tailscale consumes that and puts an amazingly easy-to-use UI, and troubleshooting tools, and routing, and all kinds of forwarding capabilities, and makes it kind of easy, which is really, really, really kind of awesome. And Tailscale and Kubernetes—

Corey: Yeah, ‘network’ and ‘easy’ don’t belong in the same sentence, but in this case, they do.

Chris: Yeah. And trust me, the Kubernetes story in Tailscale, there is a lot of there. I understand you might want to not open ports in your VPC, maybe, but if you use Tailscale, that node is just another thing on your network. You can connect to that and see what’s going on. Your management cluster is just another thing on the network where you can watch the state.

But it’s all—you’re connected to it continuously through Tailscale. Or, you know, it’s a much lighter weight, kind of meshy VPN, I would say, if I had to sum it up in one sentence. That was not on our agenda to talk about at all. Anyways. [laugh]

Corey: No, no. I love how many different topics we talk about on these things. We’ll have to have you back soon to talk again. I really want to thank you for being so generous with your time. If people want to learn more about what you’re up to and how you view these things, where can they find you?

Chris: Go to ChrisShort.net. So, Chris Short—I’m six-four so remember, it’s Short—dot net, and you will find all the places that I write, you can go to devopsish.com to subscribe to my newsletter, which goes out every week. This year. Next year, there’ll be breaks. And then finally, if you want to follow me on Twitter, Chris Short: at @ChrisShort on Twitter. All one word so you see two s’s. Like, it’s okay, there’s two s’s there.

Corey: Links to all of that will of course be in the show notes. It’s easier for people to do the clicky-clicky thing as a general rule.

Chris: Clicky things are easier than the wordy things, yes.

Corey: Says the Kubernetes guy.

Chris: Yeah. Says the Kubernetes guy. Yeah, you like that, huh? Like I said, Argo gives you a UI. [laugh].

Corey: Thank you [laugh] so much for your time. I really do appreciate it.

Chris: Thank you. This has been fun. If folks have questions, feel free to reach out. Like, I am not one of those people that hides behind a screen all day and doesn’t respond. I will respond to you eventually.

Corey: I’m right here, Chris. Come on, come on. You’re calling me out in front of myself. My God.

Chris: Egh. It might take a day or two, but I will respond. I promise.

Corey: Thanks again for your time. This has been Chris Short, senior developer advocate at AWS. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice and if it’s YouTube, click the thumbs-up button. Whereas if you’ve hated this podcast, same thing, smash the buttons five-star review and leave an insulting comment that is written in syntactically correct YAML because it’s just so easy to do.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.
Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.