The Art and Science of Database Innovation with Andi Gutmans

Episode Summary

Andi Gutmans, General Manager and Vice President, Engineering at Google, joins Corey on Screaming in the Cloud to discuss all things database innovation at Google Cloud. Andi explains why significant surges of customers are switching from legacy proprietary databases to open APIs, and how Google is taking a pragmatic approach to understanding the main characteristics of the workloads their customers need to address and building the best services around those. Andi also reveals his thoughts on the worst and best database options, as well as how developers can future-proof their development by starting small without having to reengineer and reprovision as their projects scale.

Episode Show Notes & Transcript

About Andi

Andi Gutmans is the General Manager and Vice President for Databases at Google. Andi’s focus is on building, managing and scaling the most innovative database services to deliver the industry’s leading data platform for businesses.

Before joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Before his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP.

Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation’s board of directors. He holds a bachelor’s degree in Computer Science from the Technion, Israel Institute of Technology.

Links Referenced:

LinkedIn: https://www.linkedin.com/in/andigutmans/
Twitter: https://twitter.com/andigutmans

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. This promoted episode is brought to us by our friends at Google Cloud, and in so doing, they have gotten a guest to appear on this show that I have been low-key trying to get here for a number of years. Andi Gutmans is VP and GM of Databases at Google Cloud. Andi, thank you for joining me.

Andi: Corey, thanks so much for having me.

Corey: I have to begin with the obvious. Given that one of my personal passion projects is misusing every cloud service I possibly can as a database, where do you start and where do you stop as far as saying, “Yes, that’s a database,” so it rolls up to me and, “No, that’s not a database, so someone else can deal with the nonsense?”

Andi: I’m in charge of the operational databases, so that includes both the managed third-party databases such as MySQL, Postgres, SQL Server, and then also the cloud-first databases, such as Spanner, Big Table, Firestore, and AlloyDB. So, I suggest that’s where you start because those are all awesome services. And then what doesn’t fall underneath, kind of, that purview are things like BigQuery, which is an analytics, you know, data warehouse, and other analytics engines. And of course, there’s always folks who bring in their favorite, maybe, lesser-known or less popular database and self-manage it on GCE, on Compute.

Corey: Before you wound up at Google Cloud, you spent roughly four years at AWS as VP of Analytics, which is, again, one of those very hazy type of things. Where does it start? Where does it stop? It’s not at all clear from the outside. But even before that, you were, I guess, something of a legendary figure, which I know is always a weird thing for people to hear.

But you were partially at least responsible for the Zend Framework in the PHP world, which I didn’t realize what the heck that was, despite supporting it in production at a couple of jobs, until after I, for better or worse, was no longer trusted to support production environments anymore. Which, honestly, if you can get out, I’m a big proponent of doing that. You sleep so much better without a pager. How did you go from programming languages all the way on over to databases? It just seems like a very odd mix.

Andi: Yeah. No, that’s a great question. So, I was one of the core developers of PHP, and you know, I had been in the PHP community for quite some time. I also helped ideate. The Zend Framework, which was the company that, you know, I co-founded Zend Technologies was kind of the company behind PHP.

So, like Red Hat supports Linux commercially, we supported PHP. And I was very much focused on developers, programming languages, frameworks, IDEs, and that was, you know, really exciting. I had also done quite a bit of work on interoperability with databases, right, because behind every application, there’s a database, and so a lot of what we focused on is a great connectivity to MySQL, to Postgres, to other databases, and I got to kind of learn the database world from the outside from the application builders. We sold our company in I think it was 2015 and so I had to kind of figure out what’s next. And so, one option would have been, hey, stay in programming languages, but what I learned over the many years that I worked with application developers is that there’s a huge amount of value in data.

And frankly, I’m a very curious person; I always like to learn, so there was this opportunity to join Amazon, to join the non-relational database side, and take myself completely out of my comfort zone. And actually, I joined AWS to help build the graph database Amazon Neptune, which was even more out of my comfort zone than even probably a relational database. So, I kind of like to do different things and so I joined and I had to learn, you know how to build a database pretty much from the ground up. I mean, of course, I didn’t do the coding, but I had to learn enough to be dangerous, and so I worked on a bunch of non-relational databases there such as, you know, Neptune, Redis, Elasticsearch, DynamoDB Accelerator. And then there was the opportunity for me to actually move over from non-relational databases to analytics, which was another way to get myself out of my comfort zone.

And so, I moved to run the analytic space, which included services like Redshift, like EMR, Athena, you name it. So, that was just a great experience for me where I got to work with a lot of awesome people and learn a lot. And then the opportunity arose to join Google and actually run the Google transactional databases including their older relational databases. And by the way, my job actually have two jobs. One job is running Spanner and Big Table for Google itself—meaning, you know, search ads and YouTube and everything runs on these databases—and then the second job is actually running external-facing databases for external customers.

Corey: How alike are those two? Is it effectively the exact same thing, just with different API endpoints? Are they two completely separate universes? It’s always unclear from the outside when looking at large companies that effectively eat versions of their own dog food, where their internal usage of these things starts and stops.

Andi: So, great question. So, Cloud Spanner and Cloud Big Table do actually use the internal Spanner and Big Table. So, at the core, it’s exactly the same engine, the same runtime, same storage, and everything. However, you know, kind of, internally, the way we built the database APIs was kind of good for scrappy, you know, Google engineers, and you know, folks are kind of are okay, learning how to fit into the Google ecosystem, but when we needed to make this work for enterprise customers, we needed a cleaner APIs, we needed authentication that was an external, right, and so on, so forth. So, think about we had to add an additional set of APIs on top of it, and management, right, to really make these engines accessible to the external world.

So, it’s running the same engine under the hood, but it is a different set of APIs, and a big part of our focus is continuing to expose to enterprise customers all the goodness that we have on the internal system. So, it’s really about taking these very, very unique differentiated databases and democratizing access to them to anyone who wants to.

Corey: I’m curious to get your position on the idea that seems to be playing it’s—I guess, a battle that’s been playing itself out in a number of different customer conversations. And that is, I guess, the theoretical decision between, do we go towards general-purpose databases and more or less treat every problem as a nail in search of a hammer or do you decide that every workload gets its own custom database that aligns the best with that particular workload? There are trade-offs in either direction, but I’m curious where you land on that given that you tend to see a lot more of it than I do.

Andi: No, that’s a great question. And you know, just for the viewers who maybe aren’t aware, there’s kind of two extreme points of view, right? There’s one point of view that says, purpose-built for everything, like, every specific pattern, like, build bespoke databases, it’s kind of a best-of-breed approach. The problem with that approach is it becomes extremely complex for customers, right? Extremely complex to decide what to use, they might need to use multiple for the same application, and so that can be a bit daunting as a customer. And frankly, there’s kind of a law of diminishing returns at some point.

Corey: Absolutely. I don’t know what the DBA role of the future is, but I don’t think anyone really wants it to be, “Oh, yeah. We’re deciding which one of these three dozen manage database services is the exact right fit for each and every individual workload.” I mean, at some point it feels like certain cloud providers believe that not only every workload should have its own database, but almost every workload should have its own database service. It’s at some point, you’re allowed to say no and stop building these completely, what feel like to me, Byzantine, esoteric database engines that don’t seem to have broad applicability to a whole lot of problems.

Andi: Exactly, exactly. And maybe the other extreme is what folks often talk about as multi-model where you say, like, “Hey, I’m going to have a single storage engine and then map onto that the relational model, the document model, the graph model, and so on.” I think what we tend to see is if you go too generic, you also start having performance issues, you may not be getting the right level of abilities and trade-offs around consistency, and replication, and so on. So, I would say Google, like, we’re taking a very pragmatic approach where we’re saying, “You know what? We’re not going to solve all of customer problems with a single database, but we’re also not going to have two dozen.” Right?

So, we’re basically saying, “Hey, let’s understand that the main characteristics of the workloads that our customers need to address, build the best services around those.” You know, obviously, over time, we continue to enhance what we have to fit additional models. And then frankly, we have a really awesome partner ecosystem on Google Cloud where if someone really wants a very specialized database, you know, we also have great partners that they can use on Google Cloud and get great support and, you know, get the rest of the benefits of the platform.

Corey: I’m very curious to get your take on a pattern that I’ve seen alluded to by basically every vendor out there except the couple of very obvious ones for whom it does not serve their particular vested interests, which is that there’s a recurring narrative that customers are demanding open-source databases for their workloads. And when you hear that, at least, people who came up the way that I did, spending entirely too much time on Freenode, back when that was not a deeply problematic statement in and of itself, where, yes, we’re open-source, I guess, zealots is probably the best terminology, and yeah, businesses are demanding to participate in the open-source ecosystem. Here in reality, what I see is not ideological purity or anything like that and much more to do with, “Yeah, we don’t like having a single commercial vendor for our databases that basically plays the insert quarter to continue dance whenever we’re trying to wind up doing something new. We want the ability to not have licensing constraints around when, where, how, and how quickly we can run databases.” That’s what I hear when customers are actually talking about open-source versus proprietary databases. Is that what you see or do you think that plays out differently? Because let’s be clear, you do have a number of database services that you offer that are not open-source, but are also absolutely not tied to weird licensing restrictions either?

Andi: That’s a great question, and I think for years now, customers have been in a difficult spot because the legacy proprietary database vendors, you know, knew how sticky the database is, and so as a result, you know, the prices often went up and was not easy for customers to kind of manage costs and agility and so on. But I would say that’s always been somewhat of a concern. I think what I’m seeing changing and happening differently now is as customers are moving into the cloud and they want to run hybrid cloud, they want to run multi-cloud, they need to prove to their regulator that it can do a stressed exit, right, open-source is not just about reducing cost, it’s really about flexibility and kind of being in control of when and where you can run the workloads. So, I think what we’re really seeing now is a significant surge of customers who are trying to get off legacy proprietary database and really kind of move to open APIs, right, because they need that freedom. And that freedom is far more important to them than even the cost element.

And what’s really interesting is, you know, a lot of these are the decision-makers in these enterprises, not just the technical folks. Like, to your point, it’s not just open-source advocates, right? It’s really the business people who understand they need the flexibility. And by the way, even the regulators are asking them to show that they can flexibly move their workloads as they need to. So, we’re seeing a huge interest there and, as you said, like, some of our services, you know, are open-source-based services, some of them are not.

Like, take Spanner, as an example, it is heavily tied to how we build our infrastructure and how we build our systems. Like, I would say, it’s almost impossible to open-source Spanner, but what we’ve done is we’ve basically embraced open APIs and made sure if a customer uses these systems, we’re giving them control of when and where they want to run their workloads. So, for example, Big Table has an HBase API; Spanner now has a Postgres interface. So, our goal is really to give customers as much flexibility and also not lock them into Google Cloud. Like, we want them to be able to move out of Google Cloud so they have control of their destiny.

Corey: I’m curious to know what you see happening in the real world because I can sit here and come up with a bunch of very well-thought-out logical reasons to go towards or away from certain patterns, but I spent years building things myself. I know how it works, you grab the closest thing handy and throw it in and we all know that there is nothing so permanent as a temporary fix. Like, that thing is load-bearing and you’ll retire with that thing still in place. In the idealized world, I don’t think that I would want to take a dependency on something like—easy example—Spanner or AlloyDB because despite the fact that they have Postgres-squeal—yes, that’s how I pronounce it—compatibility, the capabilities of what they’re able to do under the hood far exceed and outstrip whatever you’re going to be able to build yourself or get anywhere else. So, there’s a dataflow architectural dependency lock-in, despite the fact that it is at least on its face, Postgres compatible. Counterpoint, does that actually matter to customers in what you are seeing?

Andi: I think it’s a great question. I’ll give you a couple of data points. I mean, first of all, even if you take a complete open-source product, right, running them in different clouds, different on-premises environments, and so on, fundamentally, you will have some differences in performance characteristics, availability characteristics, and so on. So, the truth is, even if you use open-source, right, you’re not going to get a hundred percent of the same characteristics where you run that. But that said, you still have the freedom of movement, and with I would say and not a huge amount of engineering investment, right, you’re going to make sure you can run that workload elsewhere.

I kind of think of Spanner in the similar way where yes, I mean, you’re going to get all those benefits of Spanner that you can’t get anywhere else, like unlimited scale, global consistency, right, no maintenance downtime, five-nines availability, like, you can’t really get that anywhere else. That said, not every application necessarily needs it. And you still have that option, right, that if you need to, or want to, or we’re not giving you a reasonable price or reasonable price performance, but we’re starting to neglect you as a customer—which of course we wouldn’t, but let’s just say hypothetically, that you know, that could happen—that you still had a way to basically go and run this elsewhere. Now, I’d also want to talk about some of the upsides something like Spanner gives you. Because you talked about, you want to be able to just grab a few things, build something quickly, and then, you know, you don’t want to be stuck.

The counterpoint to that is with Spanner, you can start really, really small, and then let’s say you’re a gaming studio, you know, you’re building ten titles hoping that one of them is going to take off. So, you can build ten of those, you know, with very minimal spend on Spanner and if one takes off overnight, it’s really only the database where you don’t have to go and re-architect the application; it’s going to scale as big as you need it to. And so, it does enable a lot of this innovation and a lot of cost management as you try to get to that overnight success.

Corey: Yeah, overnight success. I always love that approach. It’s one of those, “Yeah, I became an overnight success after only ten short years.” It becomes this idea people believe it’s in fits and starts, but then you see, I guess, on some level, the other side of it where it’s a lot of showing up and doing the work. I have to confess, I didn’t do a whole lot of admin work in my production years that touched databases because I have an aura and I’m unlucky, and it turns out that when you blow away some web servers, everyone can laugh and we’ll reprovision stateless things.

Get too close to the data warehouse, for example, and you don’t really have a company left anymore. And of course, in the world of finance that I came out of, transactional integrity is also very much a thing. A question that I had [centers 00:17:51] really around one of the predictions you gave recently at Google Cloud Next, which is your prediction for the future is that transactional and analytical workloads from a database perspective will converge. What’s that based on?

Andi: You know, I think we’re really moving from a world where customers are trying to make real-time decisions, right? If there’s model drift from an AI and ML perspective, want to be able to retrain their models as quickly as possible. So, everything is fast moving into streaming. And I think what you’re starting to see is, you know, customers don’t have that time to wait for analyzing their transactional data. Like in the past, you do a batch job, you know, once a day or once an hour, you know, move the data from your transactional system to analytical system, but that’s just not how it is always-on businesses run anymore, and they want to have those real-time insights.

So, I do think that what you’re going to see is transactional systems more and more building analytical capabilities, analytical systems building, and more transactional, and then ultimately, cloud platform providers like us helping fill that gap and really making data movement seamless across transactional analytical, and even AI and ML workloads. And so, that’s an area that I think is a big opportunity. I also think that Google is best positioned to solve that problem.

Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.

Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?

Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscale

Corey: On some level, I’ve found that, at least in my own work, that once I wind up using a database for something, I’m inclined to try and stuff as many other things into that database as I possibly can just because getting a whole second data store, taking a dependency on it for any given workload tends to be a little bit on the, I guess, challenging side. Easy example of this. I’ve talked about it previously in various places, but I was talking to one of your colleagues, [Sarah Ellis 00:19:48], who wound up at one point making a joke that I, of course, took way too far. Long story short, I built a Twitter bot on top of Google Cloud Functions that every time the Azure brand account tweets, it simply quote-tweets that translates their tweet into all caps, and then puts a boomer-style statement in front of it if there’s room. This account is @cloudboomer.

Now, the hard part that I had while doing this is everything stateless works super well. Where do I wind up storing the ID of the last tweet that it saw on his previous run? And I was fourth and inches from just saying, “Well, I’m already using Twitter so why don’t we use Twitter as a database?” Because everything’s a database if you’re either good enough or bad enough at programming. And instead, I decided, okay, we’ll try this Firebase thing first.

And I don’t know if it’s Firestore, or Datastore or whatever it’s called these days, but once I wrap my head around it incredibly effective, very fast to get up and running, and I feel like I made at least a good decision, for once in my life, involving something touching databases. But it’s hard. I feel like I’m consistently drawn toward the thing I’m already using as a default database. I can’t shake the feeling that that’s the wrong direction.

Andi: I don’t think it’s necessarily wrong. I mean, I think, you know, with Firebase and Firestore, that combination is just extremely easy and quick to build awesome mobile applications. And actually, you can build mobile applications without a middle tier which is probably what attracted you to that. So, we just see, you know, huge amount of developers and applications. We have over 4 million databases in Firestore with just developers building these applications, especially mobile-first applications. So, I think, you know, if you can get your job done and get it done effectively, absolutely stick to them.

And by the way, one thing a lot of people don’t know about Firestore is it’s actually running on Spanner infrastructure, so Firestore has the same five-nines availability, no maintenance downtime, and so on, that has Spanner, and the same kind of ability to scale. So, it’s not just that it’s quick, it will actually scale as much as you need it to and be as available as you need it to. So, that’s on that piece. I think, though, to the same point, you know, there’s other databases that we’re then trying to make sure kind of also extend their usage beyond what they’ve traditionally done. So, you know, for example, we announced AlloyDB, which I kind of call it Postgres on steroids, we added analytical capabilities to this transactional database so that as customers do have more data in their transactional database, as opposed to having to go somewhere else to analyze it, they can actually do real-time analytics within that same database and it can actually do up to 100 times faster analytics than open-source Postgres.

So, I would say both Firestore and AlloyDB, are kind of good examples of if it works for you, right, we’ll also continue to make investments so the amount of use cases you can use these databases for continues to expand over time.

Corey: One of the weird things that I noticed just looking around this entire ecosystem of databases—and you’ve been in this space long enough to, presumably, have seen the same type of evolution—back when I was transiting between different companies a fair bit, sometimes because I was consulting and other times because I’m one of the greatest in the world at getting myself fired from jobs based upon my personality, I found that the default standard was always, “Oh, whatever the database is going to be, it started off as MySQL and then eventually pivots into something else when that starts falling down.” These days, I can’t shake the feeling that almost everywhere I look, Postgres is the answer instead. What changed? What did I miss in the ecosystem that’s driving that renaissance, for lack of a better term?

Andi: That’s a great question. And, you know, I have been involved in—I’m going to date myself a bit—but in PHP since 1997, pretty much, and one of the things we kind of did is we build a really good connector to MySQL—and you know, I don’t know if you remember, before MySQL, there was MS SQL. So, the MySQL API actually came from MS SQL—and we bundled the MySQL driver with PHP. And so, kind of that LAMP stack really took off. And kind of to your point, you know, the default in the web, right, was like, you’re going to start with MySQL because it was super easy to use, just fun to use.

By the way, I actually wrote—co-authored—the tab completion in the MySQL client. So like, a lot of these kinds of, you know, fun, simple ways of using MySQL were there, and frankly, was super fast, right? And so, kind of those fast reads and everything, it just was great for web and for content. And at the time, Postgres kind of came across more like a science project. Like the folks who were using Postgres were kind of the outliers, right, you know, the less pragmatic folks.

I think, what’s changed over the past, how many years has it been now, 25 years—I’m definitely dating myself—is a few things: one, MySQL is still awesome, but it didn’t kind of go in the direction of really, kind of, trying to catch up with the legacy proprietary databases on features and functions. Part of that may just be that from a roadmap perspective, that’s not where the owner wanted it to go. So, MySQL today is still great, but it didn’t go into that direction. In parallel, right, customers wanting to move more to open-source. And so, what they found this, the thing that actually looks and smells more like legacy proprietary databases is actually Postgres, plus you saw an increase of investment in the Postgres ecosystem, also very liberal license.

So, you have lots of other databases including commercial ones that have been built off the Postgres core. And so, I think you are today in a place where, for mainstream enterprise, Postgres is it because that is the thing that has all the features that the enterprise customer is used to. MySQL is still very popular, especially in, like, content and web, and mobile applications, but I would say that Postgres has really become kind of that de facto standard API that’s replacing the legacy proprietary databases.

Corey: I’ve been on the record way too much as saying, with some justification, that the best database in the world that should be used for everything is Route 53, specifically, TXT records. It’s a key-value store and then anyone who’s deep enough into DNS or databases generally gets a slightly greenish tinge and feels ill. That is my simultaneous best and worst database. I’m curious as to what your most controversial opinion is about the worst database in the world that you’ve ever seen.

Andi: This is the worst database? Or—

Corey: Yeah. What is the worst database that you’ve ever seen? I know, at some level, since you manage all things database, I’m asking you to pick your least favorite child, but here we are.

Andi: Oh, that’s a really good question. No, I would say probably the, “Worst database,” double-quotes is just the file system, right? When folks are basically using the file system as regular database. And that can work for, you know, really simple apps, but as apps get more complicated, that’s not going to work. So, I’ve definitely seen some of that.

I would say the most awesome database that is also file system-based kind of embedded, I think was actually SQLite, you know? And SQLite is actually still very, very popular. I think it sits on every mobile device pretty much on the planet. So, I actually think it’s awesome, but it’s, you know, it’s on a database server. It’s kind of an embedded database, but it’s something that I, you know, I’ve always been pretty excited about. And, you know, their stuff [unintelligible 00:27:43] kind of new, interesting databases emerging that are also embedded, like DuckDB is quite interesting. You know, it’s kind of the SQLite for analytics.

Corey: We’ve been using it for a few things around a bill analysis ourselves. It’s impressive. I’ve also got to say, people think that we had something to do with it because we’re The Duckbill Group, and it’s DuckDB. “Have you done anything with this?” And the answer is always, “Would you trust me with a database? I didn’t think so.” So no, it’s just a weird coincidence. But I liked that a lot.

It’s also counterintuitive from where I sit because I’m old enough to remember when Microsoft was teasing the idea of WinFS where they teased a future file system that fundamentally was a database—I believe it’s an index or journal for all of that—and I don’t believe anything ever came of it. But ugh, that felt like a really weird alternate world we could have lived in.

Andi: Yeah. Well, that’s a good point. And by the way, you know, if I actually take a step back, right, and I kind of half-jokingly said, you know, file system and obviously, you know, all the popular databases persist on the file system. But if you look at what’s different in cloud-first databases, right, like, if you look at legacy proprietary databases, the typical setup is wright to the local disk and then do asynchronous replication with some kind of bounded replication lag to somewhere else, to a different region, or so on. If you actually start to look at what the cloud-first databases look like, they actually write the data in multiple data centers at the same time.

And so, kind of joke aside, as you start to think about, “Hey, how do I build the next generation of applications and how do I really make sure I get the resiliency and the durability that the cloud can offer,” it really does take a new architecture. And so, that’s where things like, you know, Spanner and Big Table, and kind of, AlloyDB databases are truly architected for the cloud. That’s where they actually think very differently about durability and replication, and what it really takes to provide the highest level of availability and durability.

Corey: On some level, I think one of the key things for me to realize was that in my own experiments, whenever I wind up doing something that is either for fun or I just want see how it works in what’s possible, the scale of what I’m building is always inherently a toy problem. It’s like the old line that if it fits in RAM, you don’t have a big data problem. And then I’m looking at things these days that are having most of a petabyte’s worth of RAM sometimes it’s okay, that definition continues to extend and get ridiculous. But I still find that most of what I do in a database context can be done with almost any database. There’s no reason for me not to, for example, uses a SQLite file or to use an object store—just there’s a little latency, but whatever—or even a text file on disk.

The challenge I find is that as you start scaling and growing these things, you start to run into limitations left and right, and only then it’s one of those, oh, I should have made different choices or I should have built-in abstractions. But so many of these things comes to nothing; it just feels like extra work. What guidance do you have for people who are trying to figure out how much effort to put in upfront when they’re just more or less puttering around to see what comes out of it?

Andi: You know, we like to think about ourselves at Google Cloud as really having a unique value proposition that really helps you future-proof your development. You know, if I look at both Spanner and I look at BigQuery, you can actually start with a very, very low cost. And frankly, not every application has to scale. So, you can start at low cost, you can have a small application, but everyone wants two things: one is availability because you don’t want your application to be down, and number two is if you have to scale you want to be able to without having to rewrite your application. And so, I think this is where we have a very unique value proposition, both in how we built Spanner and then also how we build BigQuery is that you can actually start small, and for example, on Spanner, you can go from one-tenth of what we call an instance, like, a small instance, that is, you know, under $65 a month, you can go to a petabyte scale OLTP environment with thousands of instances in Spanner, with zero downtime.

And so, I think that is really the unique value proposition. We’re basically saying you can hold the stick at both ends: you can basically start small and then if that application doesn’t need to scale, does need to grow, you’re not reengineering your application and you’re not taking any downtime for reprovisioning. So, I think that’s—if I had to give folks, kind of, advice, I say, “Look, what’s done is done. You have workloads on MySQL, Postgres, and so on. That’s great.”

Like, they’re awesome databases, keep on using them. But if you’re truly building a new app, and you’re hoping that app is going to be successful at some point, whether it’s, like you said, all overnight successes take at least ten years, at least you built in on something like Spanner, you don’t actually have to think about that anymore or worry about it, right? It will scale when you need it to scale and you’re not going to have to take any downtime for it to scale. So, that’s how we see a lot of these industries that have these potential spikes, like gaming, retail, also some use cases in financial services, they basically gravitate towards these databases.

Corey: I really want to thank you for taking so much time out of your day to talk with me about databases and your perspective on them, especially given my profound level of ignorance around so many of them. If people want to learn more about how you view these things, where’s the best place to find you?

Andi: Follow me on LinkedIn. I tend to post quite a bit on LinkedIn, I still post a bit on Twitter, but frankly, I’ve moved more of my activity to LinkedIn now. I find it’s—

Corey: That is such a good decision. I envy you.

Andi: It’s a more curated [laugh], you know, audience and so on. And then also, you know, we just had Google Cloud Next. I recorded a session there that kind of talks about database and just some of the things that are new in database-land at Google Cloud. So, that’s another thing that if folks more interested to get more information, that may be something that could be appealing to you.

Corey: We will, of course, put links to all of this in the [show notes 00:34:03]. Thank you so much for your time. I really appreciate it.

Andi: Great. Corey, thanks so much for having me.

Corey: Andi Gutmans, VP and GM of Databases at Google Cloud. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment, then I’m going to collect all of those angry, insulting comments and use them as a database.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

The Art and Science of Database Innovation with Andi Gutmans

Episode Summary

Episode Show Notes & Transcript

You might also like

See Why GenAI Workloads Are Breaking Observability with Wayne Segar

Presenting at re:Invent with Matt Berk and Bowen Wang

The Latest State of IaC with Ido Neeman

Get the Newsletter

Sponsor an Episode