Episode Show Notes & Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. This promoted guest episode of Screaming in the Cloud is brought to us by our friends at Couchbase. Also brought to us by Couchbase is today’s victim, for lack of a better term. Jeff Morris is their VP of Product and Solutions Marketing. Jeff, thank you for joining me.
Jeff: Thanks for having me, Corey, even though I guess I paid for it.
Corey: Exactly. It’s always great to say thank you when people give you things. I learned this from a very early age, and the only people who didn’t were rude children and turned into worse adults.
Corey: So, you are effectively announcing something new today, and I always get worried when a database company says that because sometimes it’s a license that is going to upset people, sometimes it’s dyed so deep in the wool of generative AI that, “Oh, we’re now supporting vectors or whatnot.” Well, most of us don’t know what that means.
Corey: Fortunately, I don’t believe that’s what you’re doing today. What have you got for us?
Jeff: So, you’re right. It’s—well, what I’m doing is, we’re announcing new stuff inside of Couchbase and helping Couchbase expand its market footprint, but we’re not really moving away from our sweet spot, either, right? We like building—or being the database platform underneath applications. So, push us on the operational side of the operational versus analytic, kind of, database divide. But we are announcing a columnar data store inside of the Couchbase platform so that we can build bigger, better, stronger analytic functionality to feed the applications that we’re supporting with our customers.
Corey: Now, I feel like I should ask a question around what a columnar data store is because my first encounter with the term was when I had a very early client for AWS bill optimization when I was doing this independently, and I was asking them the… polite question of, “Why do you have 283 billion objects in a single S3 bucket? That is atypical and kind of terrifying.” And their answer was, “Oh, we built our own columnar data store on top of S3. This might not have been the best approach.” It’s like, “I’m going to stop you there. With no further information, I can almost guarantee you that it was not.” But what is a columnar data store?
Jeff: Well, let’s start with the, everybody loves more data and everybody loves to count more things, right, but a columnar data store allows you to expedite the kind of question that you ask of the data itself by not having to look at every single row of the data while you go through it. You can say, if you know you’re only looking for data that’s inside of California, you just look at the column value of find me everything in California and then I’ll pick all of those records to analyze. So, it gives you a faster way to go through the data while you’re trying to gather it up and perform aggregations against it.
Corey: It seems like it’s one of those, “Well, that doesn’t sound hard,” type of things, when you’re thinking about it the way that I do, in terms of a database being more or less a medium to large size Excel spreadsheet. But I have it on good faith from all the customer environments. I’ve worked with that no, no, there are data stores that span even larger than that, which is, you know, one of those sad realities of the world. And everything at scale begins to be a heck of a lot harder. I’ve seen some of the value that this stuff offers and I can definitely understand a few different workloads in which case that’s going to be super handy. What are you targeting specifically? Or is this one of those areas where you’re going to learn from your customers?
Jeff: Well, we’ve had analytic functionality inside the platform. It just, at the size and scale customers actually wanted to roam through the data, we weren’t supporting that that much. So, we’ll expand that particular footprint, it’ll give us better integration capabilities with external systems, or better access to things in your bucket. But the use case problem is, I think, going to be driven by what new modern application requirements are going to be. You’re going to need, we call it hyper-personalization because we tend to cater to B2C-style applications, things with a lot of account profiles built into them.
So, you look at account profile, and you’re like, “Oh, well Jeff likes blue, so sell him blue stuff.” And that’s a great current level personalization, but with a new analytic engine against this, you can maybe start aggregating all the inventory information that you might have of all the blue stuff that you want to sell me and do that in real-time, so I’m getting better recommendations, better offers as I’m shopping on your site or looking at my phone and, you know, looking for the next thing I want to buy.
Corey: I’m sure there’s massive amounts of work that goes into these hyper-personalization stories. The problem is that the only time they really rise to our notice is when they fail hilariously. Like, you just bought a TV, would you like to buy another? Now statistically, you are likelier to buy a second TV right after you buy one, but for someone who just, “Well, I’m replacing my living room TV after ten years,” it feels ridiculous. Or when you buy a whole bunch of nails and they don’t suggest, “Would you like to also perhaps buy a hammer?”
It’s one of those areas where it just seems like a human putting thought into this could make some sense. But I’ve seen some of the stuff that can come out of systems like this and it can be incredible. I also personally tend to bias towards use cases that are less, here’s how to convince you to buy more things and start aiming in a bunch of other different directions where it starts meeting emerging use cases or changing situations rapidly, more rapidly than a human can in some cases. The world has, for better or worse, gotten an awful lot faster over the last few decades.
Jeff: Yeah. And think of it in terms of how responsive can I be at any given moment. And so, let’s pick on one of the more recent interesting failures that has popped up. I’m a Giants fan, San Francisco Giants fan, so I’ll pick on the Dodgers. The Dodgers during the baseball playoffs, Clayton Kershaw—three-time MVP, Cy Young Award winner, great, great pitcher—had a first-inning meltdown of colossal magnitude: gave up 11 runs in the first inning to the Diamondbacks.
Well, my customer Domino’s Pizza could end up—well, let’s shift the focus of our marketing. We—you know, the Dodgers are the best team in baseball this year in the National League—let’s focus our attention there, but with that meltdown, let’s pivot to Arizona and focus on our market in Phoenix. And they could do that within minutes or seconds, even, with the kinds of capabilities that we’re coming up with here so that they can make better offers to that new environment and also do the decision intelligence behind it. Like, do I have enough dough to make a bigger offer in that big market? Do I have enough drivers or do I have to go and spin out and get one of the other food delivery folks—UberEats, or something like that—to jump on board with me and partner up on this kind of system?
It’s that responsiveness in real, real-time, right, that’s always been kind of the conundrum between applications and analytics. You get an analytic insight, but it takes you an hour or a day to incorporate that into what the application is doing. This is intended to make all of that stuff go faster. And of course, when we start to talk about things in AI, right, AI is going to expect real-time responsiveness as best you can make it.
Corey: I figure we have to talk about AI. That is a technology that has absolutely sprung to the absolute peak of the hype curve over the past year. OpenAI released Chat-Gippity, either late last year or early this year and suddenly every company seems to be falling all over itself to rebrand itself as an AI company, where, “We’ve been working on this for decades,” they say, right before they announce something that very clearly was crash-developed in six months. And every company is trying to drape themselves in the mantle of AI. And I don’t want to sound like I’m a doubter here. I’m like most fans; I see an awful lot of value here. But I am curious to get your take on what do you think is real and what do you think is not in the current hype environment.
Jeff: So yeah, I love that. I think there’s a number of things that are, you know, are real is, it’s not going away. It is going to continue to evolve and get better and better and better. One of my analyst friends came up with the notion that the exercise of generative AI, it’s imprecise, so it gives you similarity things, and that’s actually an improvement, in many cases, over the precision of a database. Databases, a transaction either works or it doesn’t. It has failover or it doesn’t, when—
Corey: It’s ideally deterministic when you ask it a question—
Corey: —the same question a second time, assuming it’s not time-bound—
Jeff: Gives you the right answer.
Corey: Yeah, the sa—or at least the same answer.
Jeff: The same answer. And your gen AI may not. So, that’s a part of the oddity of the hype. But then it also helps me kind of feed our storyline of if you’re going to try and make Gen AI closer and more accurate, you need a clean pool of data that you’re dealing with, even though you’ve got probably—your previous design was such that you would use a relational database for transactions, a document database for your user profiles, you’d probably attach your website to a caching database because you needed speed and a lot of concurrency. Well, now you got three different databases there that you’re operating.
And if you’re feeding data from each of those databases back to AI, one of them might be wrong or one of them might confuse the AI, yet how are you going to know? The complexity level is going to become, like, exponential. So, our premise is, because we’re a multi-modal database that incorporates in-memory speed and documents and search and transactions and the like, if you start with a cleaner pool of data, you’ll have less complexity that you’re offering to your AI system and therefore you can steer it into becoming more accurate in its response. And then, of course, all the data that we’re dealing with is on mobile, right? Data is created there for, let’s say, your account profile, and then it’s also consumed there because that’s what people are using as their application interface of choice.
So, you also want to have mobile interactivity and synchronization and local storage, kind of, capabilities built in there. So, those are kind of, you know, a couple of the principles that we’re looking at of, you know, JSON is going to be a great format for it regardless of what happens; complexity is kind of the enemy of AI, so you don’t want to go there; and mobility is going to be an absolute requirement. And then related to this particular announcement, large-scale aggregation is going to be a requirement to help feed the application. There’s always going to be some other bigger calculation that you’re going to want to do relatively in real time and feed it back to your users or the AI system that’s helping them out.
Corey: I think that that is a much more nuanced use case than a lot of the stuff that’s grabbing customer attentions where you effectively have the Chat-Gippity story of it being an incredible parrot. Where I have run into trouble with the generative story has been people putting the thing that the robot that’s magic and from the future has come up with off the cuff and just hurling that out into the universe under their own name without any human review, and that’s fine sometimes sure, but it does get it hilariously wrong at some points. And the idea of sending something out under my name that has not been at least reviewed by me if not actually authored by me, is abhorrent. I mean, I review even the transactional, “Yes, you have successfully subscribed,” or, “Sorry to see you go,” email confirmations on stuff because there’s an implicit, “Hugs and puppies, love Corey,” at the end of everything that goes out under my name.
Corey: But I’ve gotten a barrage of terrible sales emails and companies that are trying to put the cart before the horse where either the, “Support rep,” quote-unquote, that I’m speaking to in the chat is an AI system or else needs immediate medical attention because there’s something going on that needs assistance.
Jeff: Yeah, they just don’t understand.
Corey: Right. And most big enterprise stories that I’ve heard so far that have come to light have been around the form of, “We get to fire most of our customer service staff,” an outcome that basically no one sensible wants. That is less compelling than a lot of the individualized consumer use cases. I love asking it, “Here’s a blog post I wrote. Give me ten title options.” And I’ll usually take one of them—one of them is usually not half bad and then I can modify it slightly.
Jeff: And you’ll change four words in it. Yeah.
Corey: Yeah, exactly. That’s a bit of a different use case.
Jeff: It’s been an interesting—even as we’ve all become familiar—or at least junior prompt engineers, right—is, your information is only going to be as good as you feed the AI system—the return is only going to be as good—so you’re going to want to refine that kind of conversation. Now, we’re not trying to end up replacing the content that gets produced or the writing of all kinds of pros, other than we do have a code generator that works inside of our environment called Capella iQ that talks to ChatGPT, but we try and put guardrails on that too, right, as always make sure that it’s talking in terms of the context of Couchbase rather than, “Where’s Taylor Swift this week,” which I don’t want it to answer because I don’t want to spend GPT money to answer that question for you.
Corey: And it might not know the right answer, but it might very well spit out something that sounds plausible.
Jeff: Exactly. But I think the kinds of applications that we’re steering ourselves toward can be helped along by the Gen AI systems, but I don’t expect all my customers are going to be writing automatic blog post generation kinds of applications. I think what we’re ultimately trying to do is facilitate interactions in a way that we haven’t dreamt of yet, right? One of them might be if I’ve opted into to loyalty programs, like my United account and my American Express account—
Corey: That feels very targeted at my lifestyle as well, so please, continue.
Jeff: Exactly, right? And so, what I really want the system to do is for Amex to reward me when I hit 1k status on United while I’m on the flight and you know, have the flight attendant come up and be like, “Hey, you did it. Either, here’s a free upgrade from American Express”—that would be hyper-personalization because you booked your plane ticket with it, but they also happen to know or they cross-consumed information that I’ve opted into.
Corey: I’ve seen them congratulate people for hitting a million miles flown mid-flight, but that’s clearly something that they’ve been tracking and happens a heck of a lot less frequently. This is how you start scaling that experience.
Jeff: Yes. But that happened because American Airlines was always watching because that was an American Airlines ad ages ago, right, but the same principle holds true. But I think there’s going to be a lot more of these: how much information am I actually allowing to be shared amongst the, call it loyalty programs, but the data sources that I’ve opted into. And my God, there’s hundreds of them that I’ve personally opted into, whether I like it or not because everybody needs my email address, kind of like what you were describing earlier.
Corey: A point that I have that I think agrees largely with your point is that few things to me are more frustrating than what I’m signing up, for example, oh, I don’t know, an AWS even—gee, I can’t imagine there’s anything like that going on this week—and I have to fill out an entire form that always asked me the same questions: how big my company is, whether we have multiple workloads on, what industry we’re in. And no matter what I put into that, first, it never remembers me for the next time, which is frustrating in its own right, but two, no matter what I put in to fill that thing out, the email I get does not change as a result. At one point, I said, all right—I’m picking randomly—“I am a venture capitalist based in Sweden,” and I got nothing that is differentiated from the other normal stuff I get tied to my account because I use a special email address for those things, sometimes just to see what happens. And no, if you’re going to make me jump through the hoops to give you the data, at least use it to make my experience better. It feels like I’m asking for the moon here, but I shouldn’t be.
Jeff: Yes. [we need 00:16:19] to make your experience better and say, you know, “Here’s four companies in Malmo that you ought to be talking to. And they happen to be here at the AWS event and you can go find them because their booth is here, here, and here.” That kind of immediate responsiveness could be facilitated, and to our point, ought to be facilitated. It’s exactly like that kind of thing is, use the data in real-time.
I was talking to somebody else today that was discussing that most data, right, becomes stale and unvaluable, like, 50% of the data, its value goes to zero after about a day. And some of it is stale after about an hour. So, if you can end up closing that responsiveness gap that we were describing—and this is kind of what this columnar service inside of Capella is going to be like—is react in real-time with real-time calculation and real-time look-up and real-time—find out how you might apply that new piece of information right now and then give it back to the consumer or the user right now.
Corey: So, Couchbase takes a few different forms. I should probably, at least for those who are not steeped in the world of exotic forms of database, I always like making these conversations more accessible to folks who are not necessarily up to speed. Personally, I tend to misuse anything as a database, if I can hold it just the wrong way.
Jeff: The wrong way. I’ve caught that about you.
Corey: Yeah, it’s—everything is a database if you hold it wrong. But you folks have a few different options: you have a self-managed commercial offering; you’re an open-source project, so I can go ahead and run it on my own infrastructure however I want; and you have Capella, which is Couchbase as a service. And all of those are useful and have their points, and I’m sure I’m missing at least one or two along the way. But do you find that the columnar use case is going to disproportionately benefit folks using Capella in ways that the self-hosted version would not be as useful for, or is this functionality already available in other expressions of Couchbase?
Jeff: It’s not already available in other expressions, although there is analytic functionality in the self-managed version of Couchbase. But it’s, as I’ve mentioned I think earlier, it’s just not as scalable or as really real-time as far as we’re thinking. So, it’s going to—yes, it’s going to benefit the database as a service deployments of Couchbase available on your favorite three clouds, and still interoperable with environments that you might self-manage and self-host. So, there could be even use cases where our development team or your development team builds in AWS using the cloud-oriented features, but is still ultimately deploying and hosting and managing a self-managed environment. You could still do all of that. So, there’s still a great interplay and interoperability amongst our different deployment options.
But the fun part, I think, about this is not only is it going to help the Capella user, there’s a lot of other things inside Couchbase that help address the developers’ penchant for trading zero-cost for degrees of complexity that you’re willing to accept because you want everything to be free and open-source. And Couchbase is my fifth open-source company in my background, so I’m well, well versed in the nuances of what open-source developers are seeking. But what makes Couchbase—you know, its origin story really cool too, though, is it’s the peanut butter and chocolate marriage of memcached and the people behind that and membase and CouchDB from [Couch One 00:19:54]. So, I can’t think of that many—maybe Red Hat—project and companies that formed up by merging two complementary open-source projects. So, we took the scale and—
Corey: You have OpenTelemetry, I think, that did that once, but that—you see occasional mergers, but it’s very far from common.
Jeff: But it’s very, very infrequent. But what that made the Couchbase people end up doing is make a platform that will scale, make a data design that you can auto partition anywhere, anytime, and then build independently scalable services on top of that, one for SQL++, the query language. Anyone who knows SQL will be able to write something in Couchbase immediately. And I’ve got this AI Automator, iQ, that makes it even easier; you just say, “Write me a SQL++ query that does this,” and it’ll do that. But then we added full-text search, we added eventing so you can stream data, we added the analytics capability originally and now we’re enhancing it, and use JSON as our kind of universal data format so that we can trade data with applications really easily.
So, it’s a cool design to start with, and then in the cloud, we’re steering towards things like making your entry point and using our database as a service—Capella—really, really, really inexpensive so that you get that same robustness of functionality, as well as the easy cost of entry that today’s developers want. And it’s my analyst friends that keep telling me the cloud is where the markets going to go, so we’re steering ourselves towards that hockey puck location.
Corey: I frequently remark that the role of the DBA might not be vanishing, but it’s definitely changing, especially since the last time I counted, if you hold them and use as directed, AWS has something on the order of 14 distinct managed database offerings. Some are general purpose, some are purpose-built, and if this trend keeps up, in a decade, the DBA role is going to be determining which of its 40 databases is going to be the right fit for a given workload. That seems to be the counter-approach to a general-purpose database that works across the board. Clearly you folks have opinions on this. Where do you land?
Jeff: Oh, so absolutely. There’s the product that is a suite of capabilities—or that are individual capabilities—and then there’s ones that are, in my case, kind of multi-model and do lots of things at once. I think historically, you’ll recognize—because this is—let’s pick on your phone—the same holds true for, you know, your phone used to be a watch, used to be a Palm Pilot, used to be a StarTAC telephone, and your calendar application, your day planner all at the same time. Well, it’s not anymore. Technology converges upon itself; it’s kind of a historical truism.
And the database technologies are going to end up doing that—or continue to do that, even right now. So, that notion that—it’s a ten-year-old notion of use a purpose-built database for that particular workload. Maybe sometimes in extreme cases that is the appropriate thing, but in more cases than not right now, if you need transactions when you need them, that’s fine, I can do that. You don’t necessarily need Aurora or RDS or Postgres to do that. But when you need search and geolocation, I support that too, so you don’t need Elastic. And then when you need caching and everything, you don’t need ElastiCache; it’s all built-in.
So, that multi-model notion of operate on the same pool of data, it’s a lot less complex for your developers, they can code faster and better and more cleanly, debugging is significantly easier. As I mentioned, SQL++ is our language. It’s basically SQL syntax for JSON. We’re a reference implementation of this language, along with—[AsteriskDB 00:23:42] is one of them, and actually, the original author of that language also wrote DynamoDB’s PartiQL.
So, it’s a common language that you wouldn’t necessarily imagine, but the ease of entry in all of this, I think, is still going to be a driving goal for people. The old people like me and you are running around worrying about, am I going to get a particular, really specific feature out of the full-text search environment, or the other one that I pick on now is, “Am I going to need a vector database, too?” And the answer to me is no, right? There’s going—you know, the database vendors like ourselves—and like Mongo has announced and a whole bunch of other NoSQL vendors—we’re going to support that. It’s going to be just another mode, and you get better bang for your buck when you’ve got more modes than a single one at a time.
Corey: The consensus opinion that’s emerging is very much across the board that vector is a feature, not a database type.
Jeff: Not a category, yeah. Me too. And yeah, we’re well on board with that notion, as well. And then like I said earlier, the JSON as a vehicle to give you all of that versatility is great, right? You can have vector information inside a JSON document, you can have time series information in the document, you could have graph node locations and ID numbers in a JSON array, so you don’t need index-free adjacency or some of the other cleverness that some of my former employers have done. It really is all converging upon itself and hopefully everybody starts to realize that you can clean up and simplify your architectures as you look ahead, so that you do—if you’re going to build AI-powered applications—feed it clean data, right? You’re going to be better off.
Corey: So, this episode is being recorded in advance, thankfully, but it’s going to release the first day of re:Invent. What are you folks doing at the show, for those who are either there and for some reason, listening to a podcast rather than going to getting marketed to by a variety of different pitches that all mention AI or might even be watching from home and trying to figure out what to make of it?
Jeff: Right. So, of course we have a booth, and my notes don’t have in front of me what our booth number is, but you’ll see it on the signs in the airport. So, we’ll have a presence there, we’ll have an executive briefing room available, so we can schedule time with anyone who wants to come talk to us. We’ll be showing not only the capabilities that we’re offering here, we’ll show off Capella iQ, our coding assistant, okay—so yeah, we’re on the AI hype band—but we’ll also be showing things like our mobile sync capability where my phone and your phone can synchronize data amongst themselves without having to actually have a live connection to the internet. So, long as we’re on the same network locally within the Venetian’s network, we have an app that we have people download from the Apple Store and then it’s a color synchronization app or picture synchronization app.
So, you tap it, and it changes on my screen and I tap it and it changes on your screen, and we’ll have, I don’t know, as many people who are around standing there, synchronizing, what, maybe 50 phones at a time. It’s actually a pretty slick demonstration of why you might want a database that’s not only in the cloud but operates around the cloud, operates mobile-ly, operates—you know, can connect and disconnect to your networks. It’s a pretty neat scenario. So, we’ll be showing a bunch of cool technical stuff as well as talking about the things that we’re discussing right now.
Corey: I will say you’re putting an awful lot of faith in conductivity working at re:Invent, be it WiFi or the cellular network. I know that both of those have bitten me in various ways over the years. But I wish you the best on it. I think it’s going to be an interesting show based upon everything I’ve heard in the run-up to it. I’m just glad it’s here.
Jeff: Now, this is the cool part about what I’m talking about, though. The cool part about what I’m talking about is we can set up our own wireless network in our booth, and we still—you’d have to go to the app store to get this application, but once there, I can have you switch over to my local network and play around on it and I can sync the stuff right there and have confidence that in my local network that’s in my booth, the system’s working. I think that’s going to be ultimately our design there because oh my gosh, yes, I have a hundred stories about connectivity and someone blowing a demo because they’re yanking on a cable behind the pulpit, right?
Corey: I always build in a—and assuming there’s no connectivity, how can I fake my demos, just because it’s—I’ve only had to do it once, but you wind up planning in advance when you start doing a talk to a large enough or influential enough audience where you want things to go right.
Jeff: There’s a delightful acceptance right now of recorded videos and demonstrations that people sort of accept that way because of exactly all this. And I’m sure we’ll be showing that in our booth there too.
Corey: Given the non-deterministic nature of generative AI, I’m sort of surprised whenever someone hasn’t mocked the demo in advance, just because yeah, gives the right answer in the rehearsal, but every once in a while, it gets completely unglued.
Jeff: Yes, and we see it pretty regularly. So, the emergence of clever and good prompt engineering is going to be a big skill for people. And hopefully, you know, everybody’s going to figure out how to pass it along to their peers.
Corey: Excellent. We’ll put links to all this in the show notes, and I look forward to seeing how well this works out for you. Best of luck at the show and thanks for speaking with me. I appreciate it.
Jeff: Yeah, Corey. We appreciate the support, and I think the show is going to be very strong for us as well. And thanks for having me here.
Corey: Always a pleasure. Jeff Morris, VP of Product and Solutions Marketing at Couchbase. This episode has been brought to us by our friends at Couchbase. And I’m Cloud Economist Corey Quinn. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment, but if you want to remain happy, I wouldn’t ask that podcast platform what database they’re using. No one likes the answer to those things.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.