Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.
Corey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.
On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.
Many things make fine databases that replicate data from one place to another, that takes various bits of data and puts them where they need to go. Other things do not make fine databases that do such things. Let’s talk about one of those today. For those who have never had the dubious pleasure of working with it, SQLite is a C library that implements a relational database engine. And it’s pretty awesome. It’s very clearly not designed to work in a client-server fashion, but rather to be embedded into existing programs for local use. In practice, that means that if you’re running SQLite, that’s S-Q-L-I-T-E, your database backend is going to be a flat-file or something very much like that, that lives locally.
This is technology used all over the place, and mobile apps and embedded systems, in web apps for some very specific things. But that’s not quite the point. I once worked somewhere that decided to build a replicated environment that was active, active, active, across three distinct data centers. You would really hope that that statement was a non sequitur. It’s not. If you were to picture Hacker News coming to life as a person, and that person decided to design a replication model for a database from first principles, you would be pretty close to what I have seen. By taking a replicated model that runs on top of SQLite, you can get this to work, but the only way to handle that—because there’s no concept of client-server, as mentioned—so you have to kick all of the replication and state logic from the database layer, where it belongs up, into the application code itself, where it most assuredly does not belong. The downside of this—well, there are many downsides, but let’s start with a big one that this is not even slightly what SQLite was designed to do at all.
However, take a startup that decides if there’s one core competency they have, it’s knowing better than everyone else; this is that story. Now, I am obviously not a developer, and I’m certainly not a database administrator. I was an ops person, which means that a lot of the joy of various development decisions fell to whatever group I happened to be in at that point in time. It turns out that when you run replicated SQLite as a database, that you have to get around an awful lot of architectural pain points by babying this thing something fierce. There are a number of operational problems that going down a path like this will expose. Let me explain what some of them look like, after this.
In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast.
I’m not going to engage in a point-by-point teardown of this replicated SQLite as primary datastore Eldritch Horror. My favorite database personally remains Route 53, and even that’s a better plan than this monstrosity. I’m not going to tackle point-by-point, everything that made this horrifying thing, come to life, so awful to deal with. Anyone who runs this at any sort of scale for more than a week is going to discover a lot of these on their own. But I am going to cherry-pick a few things that were problematic about it. Remember back in the days of Windows, when things would get slow and crappy, and you had to basically restart your machine while the disk defragmented forever? Yeah, it turns out that most database systems have the same problem. The difference is, is that reasonable adult-level database systems that have human beings who are used to how this stuff works, tend to put that underneath the hood, so you don’t really have to think about this.
With SQLite, it wasn’t really designed for this sort of use case. So you get to wind up playing these games yourself, which is just an absolute pleasure and a joy, except the exact opposite of that. Which means that every node periodically has to be taken down in a rotation after, in our case about a week or so, or it would start chewing disk, it would take forever to start returning the results to some queries, and the performance of the entire site would wind up slamming to a halt. So, you have to make people aware that this exists. When we first discovered that it was fun. The problem here is that what you’re doing is speaking to a larger problematic pattern. Namely, you’re forcing what has historically been a low-level function that even most operations people don’t need to know or care about, into something that is now at the forefront of every developer’s mental model of the application. And if they forget that this is one of the things that has to happen, woe be unto them. Further, it should be pretty freakin’ obvious by now, by everything I’ve described about this monstrosity, that this company’s core competencies/business problem that it was solving was not building database engines. They were a classic CRUD app that solved for line-of-business problems.
This is a perfect story for a traditional relational database. Why on earth would you need to reinvent an entire database engine to solve that one relatively solved business problem? A sensible person would surmise that you, in fact, do not need to do such a thing. This was not a decision that was made by sensible people. So, assume that, at this point, you have gone way past the rails here. You are past the Rubicon, you are off the track, and you’ve built such a thing. Now assume that you’ve run into edge cases running it. Now, let me be clear. If you choose such an architecture, your entire life is going to be edge cases, if for no other reason, then this is almost certainly not the only poor decision you’ve made. But assuming that it is, every problem you hit is going to be an exercise in frustration. You don’t get to take advantage of the community effects of virtually every other datastore option on the planet. Whereas you can post on various Slack teams, on Twitter, on forums, on GitHub, etc. If you try that with something like this piece of nonsense, the answer is going to be a screaming, “What the hell have you built?” in response. At which point you are oh, so very much on your own.
Now, you might think that this episode is just me dunking on a previous crappy employer and an internal system that is never going to make it into the light of day anywhere else. Well, fun coda to this story. They open-sourced this monstrosity. You can go and look at all of this code if you know where to look. And no, I’m not going to tell you. You can find it on your own if you need this nonsense. It is 10,000 lines of C code, written on top of the SQLite library. When this was announced on Hacker News, Hacker News found it too Hacker News for their own liking and tore it to pieces in the comments. The authors of SQLite itself took one look, immediately renounced God, and went to go live lives of repentance away from the rest of humanity, which is a shame because none of this is their fault. But it does go to show that whatever wonderful thing you build and release into the world, someone will take it and turn it into something that has no business existing on God’s green earth. If you really care about what shop this came out of, you can find it if you look. I am not going to name and shame a startup. They are not a giant public multinational, like Google, or AWS, or Oracle. So, I don’t feel right dragging their name in public. The service that they build is awesome. Their architectural decisions and their team culture, honestly, were both terrible. I’ll let them out themselves should they choose to do so, but that’s not the point of this. The point of this episode is that there are oh so many worse things, to use as a database than Route 53. Thank you for listening to the Whiteboard Confessional. At least this time, it wasn’t entirely my fault.
Announcer: This has been a HumblePod production.