Kubernetes is an over-engineered solution that’s largely in search of problems that are best solved via other methods. Oh, sorry, I’m supposed to be taking this seriously. Let’s try again …

Kubernetes is an open source system for managing containerized applications in production environments.

As I mentioned previously, I’m building a production service with an eye toward deploying it on Kubernetes. I don’t want to get ahead of myself here, so I’m going to:

  • Explain how we got to Kubernetes,
  • Expound on the reasons why someone might use it to orchestrate their containerized workloads,
  • Enumerate the problems it attempts to solve, and
  • Delve into the problems it has created.

A brief history of containers

To begin, we have to go back a bit before Kubernetes arrived on the scene, to the advent of Docker.

In 2012, a company that was then named dotCloud released something called Docker. The thinking was that developers had spent way too much time saying things like “Well, it works on my machine” and getting grumpy ops people snapping back with “Well then, back up your email because your laptop’s heading to production.” One of the biggest problems in tech a decade ago was how to make your MacBook Pro look like a Linux server (including, for a few early adopter types, an EC2 instance running Linux).

It’s easy to miscredit Docker with the creation of containers, but it didn’t create the pattern. You could make a credible case that the first example of containers was back in the 1970s, when IBM introduced logical partitions, or LPARs, that divided mainframes into subsets that each ran their own instance of the operating system and got a fraction of the mainframe’s resources.

IBM stayed in the 1970s, at least culturally, but its concept didn’t. Virtual machines, BSD jails, Linux chroots, and other similar technologies proliferated. By the time of Docker’s announcement, containerization was a well-understood phenomenon.

Docker doesn’t always get the credit it’s due for nailing the user experience. docker run foo would start a container in less than a second; updating the application inside of it meant a new version of the container could be deployed rather than modifying the code running inside of something longer-lived.

Docker’s streamlined experience just got out of the developer’s way. When an application looked like it was in a fit state to ship, developers could hand off that container to their ops folks and ops could deploy it to production. The application would immutably be the same application along with its runtime dependencies, all packaged up in the form of a container.

This further served as a valid migration path forward for workloads that had been written a decade or two previously. Suddenly, you could stuff them into containers and put those containers on modern hardware, into cloud providers, or anywhere else you could trick into running Docker containers for you.

The problems Kubernetes has solved

Docker fixed a whole mess of problems from the developer perspective. It also opened up a pile of new problems for the ops folks who ran containerized applications in production.

A number of products and projects took different approaches to an ops solution, including one that arose out of Google. Google had already been using an internal cluster manager called Borg, and many of its engineers apparently needed to scratch an itch: “If we could make Borg all over again without relying on internal Google infrastructure things, how would we do it?” The answer was Kubernetes.

By the end of 2017, it was pretty clear that Kubernetes had won as the software of choice, and its competitors embraced it natively. VMware adopted it over its own Cloud Foundry, Docker stopped pushing Docker Swarm as the “better” solution, and Mesosphere pivoted so hard from Mesos to Kubernetes that it renamed the company D2IQ (presumably after a favorite character in Star Wars). It felt like every job application demanded experience with Kubernetes.

I don’t want my snark to drown out the very real value that Kubernetes provided — and continues to provide — to its adopters. If a hardware node crashes or has a failing disk array, Kubernetes can seamlessly route around it. If a container stops responding, Kubernetes can replace it automatically. Kubernetes can effectively play a game of Tetris to get various workloads packed into fewer servers, so their resources don’t go unused. And, of course, it’s the lingua franca of multi-cloud.

Every current and aspiring cloud provider today supports Kubernetes on its platform, and having at least a theoretical exodus path from your current provider is no small thing, even if you never end up using it.

The problems Kubernetes has created

There’s a reason I didn’t hurl myself into the Kubernetes abyss when it first reared its head: It has its drawbacks and then some.

The first and most obvious challenge is the borderline overwhelming complexity. The CNCF Landscape illustrates this problem nicely; once you have an application running in Kubernetes, you need to dramatically uplift your telemetry story. The old days of “log into the Linux box and check what it’s doing” don’t work when the container stopped existing 20 minutes before you knew there was a problem. The applications running within a cluster need to be able to find one another. The CI/CD process of getting code changes from developer laptops into production has gotten far more complex. The cascade failure model that takes down entire fleets of servers is now a lot more damaging than it used to be. Countless more problems exist to ensnare the unwary.

So many of these issues are precisely the kinds of things I pay a cloud provider like AWS or GCP to handle for me. If I wanted to worry about the infrastructure care and feeding parts, wouldn’t I just go apply for a job there? I have a different job, and it isn’t “cosplaying as a FAANG employee.”

Most overwhelmingly, it seems that Kubernetes is catnip for engineers who thrive on being clever. “Clever” translates directly to “complex,” and when something like Kubernetes breaks, it can be extremely tricky to find out exactly where to go to fix it.

Kubernetes: More like Greek for ‘hellish, man’

That leaves us in the present day, where I’m going to have some fun as a Kubernetes helmsman, exploring these depths myself. I know how we got here, but I don’t pretend to know what I’m doing with regard to this byzantine system. I just know that I’ve got zero problem admitting that I’m in over my head and shrieking for help.