Defining a Database with Tony Baer

Episode Summary

Tony Baer, Principal at dbInsight, joins Corey on Screaming in the Cloud to discuss his definition of what is and isn’t a database, and the trends he’s seeing in the industry. Tony explains why it’s important to try and have an outsider’s perspective when evaluating new ideas, and the growing awareness of the impact data has on our daily lives. Corey and Tony discuss the importance of working towards true operational simplicity in the cloud, and Tony also shares why explainability in generative AI is so crucial as the technology advances. 

Episode Show Notes & Transcript

About Tony

Tony Baer, the founder and CEO of dbInsight, is a recognized industry expert in extending data management practices, governance, and advanced analytics to address the desire of enterprises to generate meaningful value from data-driven transformation. His combined expertise in both legacy database technologies and emerging cloud and analytics technologies shapes how clients go to market in an industry undergoing significant transformation. 
During his 10 years as a principal analyst at Ovum, he established successful research practices in the firm’s fastest growing categories, including big data, cloud data management, and product lifecycle management. He advised Ovum clients regarding product roadmap, positioning, and messaging and helped them understand how to evolve data management and analytic strategies as the cloud, big data, and AI moved the goal posts. Baer was one of Ovum’s most heavily-billed analysts and provided strategic counsel to enterprises spanning the Fortune 100 to fast-growing privately held companies.
With the cloud transforming the competitive landscape for database and analytics providers, Baer led deep dive research on the data platform portfolios of AWS, Microsoft Azure, and Google Cloud, and on how cloud transformation changed the roadmaps for incumbents such as Oracle, IBM, SAP, and Teradata. While at Ovum, he originated the term “Fast Data” which has since become synonymous with real-time streaming analytics.
Baer’s thought leadership and broad market influence in big data and analytics has been formally recognized on numerous occasions. Analytics Insight named him one of the 2019 Top 100 Artificial Intelligence and Big Data Influencers. Previous citations include Onalytica, which named Baer as one of the world’s Top 20 thought leaders and influencers on Data Science; Analytics Week, which named him as one of 200 top thought leaders in Big Data and Analytics; and by KDnuggets, which listed Baer as one of the Top 12 top data analytics thought leaders on Twitter. While at Ovum, Baer was Ovum’s IT’s most visible and publicly quoted analyst, and was cited by Ovum’s parent company Informa as Brand Ambassador in 2017. In raw numbers, Baer has 14,000 followers on Twitter, and his ZDnet “Big on Data” posts are read 20,000 – 30,000 times monthly. He is also a frequent speaker at industry conferences such as Strata Data and Spark Summit.

Links Referenced:


Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is brought to us in part by our friends at RedHat.As your organization grows, so does the complexity of your IT resources. You need a flexible solution that lets you deploy, manage, and scale workloads throughout your entire ecosystem. The Red Hat Ansible Automation Platform simplifies the management of applications and services across your hybrid infrastructure with one platform. Look for it on the AWS Marketplace.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Back in my early formative years, I was an SRE sysadmin type, and one of the areas I always avoided was databases, or frankly, anything stateful because I am clumsy and unlucky and that’s a bad combination to bring within spitting distance of anything that, you know, can’t be spun back up intact, like databases. So, as a result, I tend not to spend a lot of time historically living in that world. It’s time to expand horizons and think about this a little bit differently. My guest today is Tony Baer, principal at dbInsight. Tony, thank you for joining me.

Tony: Oh, Corey, thanks for having me. And by the way, we’ll try and basically knock down your primal fear of databases today. That’s my mission.

Corey: We’re going to instill new fears in you. Because I was looking through a lot of your work over the years, and the criticism I have—and always the best place to deliver criticism is massively in public—is that you take a very conservative, stodgy approach to defining a database, whereas I’m on the opposite side of the world. I contain information. You can ask me about it, which we’ll call querying. That’s right. I’m a database.

But I’ve never yet found myself listed in any of your analyses around various database options. So, what is your definition of databases these days? Where do they start and stop?

Tony: Oh, gosh.

Corey: Because anything can be a database if you hold it wrong.

Tony: [laugh]. I think one of the last things I’ve ever been called as conservative and stodgy, so this is certainly a way to basically put the thumbtack on my share.

Corey: Exactly. I’m trying to normalize my own brand of lunacy, so we’ll see how it goes.

Tony: Exactly because that’s the role I normally play with my clients. So, now the shoe is on the other foot. What I view a database is, is basically a managed collection of data, and it’s managed to the point where essentially, a database should be transactional—in other words, when I basically put some data in, I should have some positive information, I should hopefully, depending on the type of database, have some sort of guidelines or schema or model for how I structure the data. So, I mean, database, you know, even though you keep hearing about unstructured data, the fact is—

Corey: Schemaless databases and data stores. Yeah, it was all the rage for a few years.

Tony: Yeah, except that they all have schemas, just that those schemaless databases just have very variable schema. They’re still schema.

Corey: A question that I have is you obviously think deeply about these things, which should not come as a surprise to anyone. It’s like, “Well, this is where I spend my entire career. Imagine that. I might think about the problem space a little bit.” But you have, to my understanding, never worked with databases in anger yourself. You don’t have a history as a DBA or as an engineer—

Tony: No.

Corey: —but what I find very odd is that unlike a whole bunch of other analysts that I’m not going to name, but people know who I’m talking about regardless, you bring actual insights into this that I find useful and compelling, instead of reverting to the mean of well, I don’t actually understand how any of these things work in reality, so I’m just going to believe whoever sounds the most confident when I ask a bunch of people about these things. Are you just asking the right people who also happen to sound confident? But how do you get away from that very common analyst trap?

Tony: Well, a couple of things. One is I purposely play the role of outside observer. In other words, like, the idea is that if basically an idea is supposed to stand on its own legs, it has to make sense. If I’ve been working inside the industry, I might take too many things for granted. And a good example of this goes back, actually, to my early days—actually this goes back to my freshman year in college where I was taking an organic chem course for non-majors, and it was taught as a logic course not as a memorization course.

And we were given the option at the end of the term to either, basically, take a final or  do a paper. So, of course, me being a writer I thought, I can BS my way through this. But what I found—and this is what fascinated me—is that as long as certain technical terms were defined for me, I found a logic to the way things work. And so, that really informs how I approach databases, how I approach technology today is I look at the logic  on how things work. That being said, in order for me to understand that, I need to know twice as much as the next guy in order to be able to speak that because I just don’t do this in my sleep.

Corey: That goes a big step toward, I guess, addressing a lot of these things, but it also feels like—and maybe this is just me paying closer attention—that the world of databases and data and analytics have really coalesced or emerged in a very different way over the past decade-ish. It used to be, at least from my perspective, that oh, that the actual, all the data we store, that’s a storage admin problem. And that was about managing NetApps and SANs and the rest. And then you had the database side of it, which functionally from the storage side of the world was just a big file or series of files that are the backing store for the database. And okay, there’s not a lot of cross-communication going on there.

Then with the rise of object store, it started being a little bit different. And even the way that everyone is talking about getting meaning from data has really seem to be evolving at an incredibly intense clip lately. Is that an accurate perception, or have I just been asleep at the wheel for a while and finally woke up?

Tony: No, I think you’re onto something there. And the reason is that, one, data is touching us all around ourselves, and the fact is, I mean, I’m you can see it in the same way that all of a sudden that people know how to spell AI. They may not know what it means, but the thing is, there is an awareness the data that we work with, the data that is about us, it follows us, and with the cloud, this data has—well, I should say not just with the cloud but with smart mobile devices—we’ll blame that—we are all each founts of data, and rich founts of data. And people in all walks of life, not just in the industry, are now becoming aware of it and there’s a lot of concern about can we have any control, any ownership over the data that should be ours? So, I think that phenomenon has also happened in the enterprise, where essentially where we used to think that the data was the DBAs’ issue, it’s become the app developers’ issue, it’s become the business analysts’ issue. Because the answers that we get, we’re ultimately accountable for. It all comes from the data.

Corey: It also feels like there’s this idea of databases themselves becoming more contextually aware of the data contained within them. Originally, this used to be in the realm of, “Oh, we know what’s been accessed recently and we can tier out where it lives for storage optimization purposes.” Okay, great, but what I’m seeing now almost seems to be a sense of, people like to talk about pouring ML into their database offerings. And I’m not able to tell whether that is something that adds actual value, or if it’s marketing-ware.

Tony: Okay. First off, let me kind of spill a couple of things. First of all, it’s not a question of the database becoming aware. A database is not sentient.

Corey: Niether are some engineers, but that’s neither here nor there.

Tony: That would be true, but then again, I don’t want anyone with shotguns lining up at my door after this—

Corey: [laugh].

Tony: —after this interview is published. But [laugh] more of the point, though, is that I can see a couple roles for machine learning in databases. One is a database itself, the logs, are an incredible font of data, of operational data. And you can look at trends in terms of when this—when the pattern of these logs goes this way, that is likely to happen. So, the thing is that I could very easily say we’re already seeing it: machine learning being used to help optimize the operation of databases, if you’re Oracle, and say, “Hey, we can have a database that runs itself.”

The other side of the coin is being able to run your own machine-learning models in database as opposed to having to go out into a separate cluster and move the data, and that’s becoming more and more of a checkbox feature. However, that’s going to be for essentially, probably, like, the low-hanging fruit, like the 80/20 rule. It’ll be like the 20% of an ana—of relatively rudimentary, you know, let’s say, predictive analyses that we can do inside the database. If you’re going to be doing something more ambitious, such as a, you know, a large language model, you probably do not want to run that in database itself. So, there’s a difference there.

Corey: One would hope. I mean, one of the inappropriate uses of technology that I go for all the time is finding ways to—as directed or otherwise—in off-label uses find ways of tricking different services into running containers for me. It’s kind of a problem; this is probably why everyone is very grateful I no longer write production code for anyone.

But it does seem that there’s been an awful lot of noise lately. I’m lazy. I take shortcuts very often, and one of those is that whenever AWS talks about something extensively through multiple marketing cycles, it becomes usually a pretty good indicator that they’re on their back foot on that area. And for a long time, they were doing that about data and how it’s very important to gather data, it unlocks the key to your business, but it always felt a little hollow-slash-hypocritical to me because you’re going to some of the same events that I have that AWS throws on. You notice how you have to fill out the exact same form with a whole bunch of mandatory fields every single time, but there never seems to be anything that gets spat back out to you that demonstrates that any human or system has ever read—

Tony: Right.

Corey: Any of that? It’s basically a, “Do what we say, not what we do,” style of story. And I always found that to be a little bit disingenuous.

Tony: I don’t want to just harp on AWS here. Of course, we can always talk about the two-pizza box rule and the fact that you have lots of small teams there, but I’d rather generalize this. And I think you really—what you’re just describing is been my trip through the healthcare system. I had some sports-related injuries this summer, so I’ve been through a couple of surgeries to repair sports injuries. And it’s amazing that every time you go to the doctor’s office, you’re filling the same HIPAA information over and over again, even with healthcare systems that use the same electronic health records software. So, it’s more a function of that it’s not just that the technologies are siloed, it’s that the organizations are siloed. That’s what you’re saying.

Corey: That is fair. And I think at some level—I don’t know if this is a weird extension of Conway’s Law or whatnot—but these things all have different backing stores as far as data goes. And there’s a—the hard part, it seems, in a lot of companies once they hit a certain point of maturity is not just getting the data in—because they’ve already done that to some extent—but it’s also then making it actionable and helping various data stores internal to the company reconcile with one another and start surfacing things that are useful. It increasingly feels like it’s less of a technology problem and more of a people problem.

Tony: It is. I mean, put it this way, I spent a lot of time last year, I burned a lot of brain cells working on data fabrics, which is an idea that’s in the idea of the beholder. But the ideal of a data fabric is that it’s not the tool that necessarily governs your data or secures your data or moves your data or transforms your data, but it’s supposed to be the master orchestrator that brings all that stuff together. And maybe sometime 50 years in the future, we might see that.

I think the problem here is both technical and organizational. [unintelligible 00:11:58] a promise, you have all these what we used call island silos. We still call them silos or islands of information. And actually, ironically, even though in the cloud we have technologies where we can integrate this, the cloud has actually exacerbated this issue because there’s so many islands of information, you know, coming up, and there’s so many different little parts of the organization that have their hands on that. That’s also a large part of why there’s such a big discussion about, for instance, data mesh last year: everybody is concerned about owning their own little piece of the pie, and there’s a lot of question in terms of how do we get some consistency there? How do we all read from the same sheet of music? That’s going to be an ongoing problem. You and I are going to get very old before that ever gets solved.

Corey: Yeah, there are certain things that I am content to die knowing that they will not get solved. If they ever get solved, I will not live to see it, and there’s a certain comfort in that, on some level.

Tony: Yeah.

Corey: But it feels like this stuff is also getting more and more complicated than it used to be, and terms aren’t being used in quite the same way as they once were. Something that a number of companies have been saying for a while now has been that customers overwhelmingly are preferring open-source. Open source is important to them when it comes to their database selection. And I feel like that’s a conflation of a couple of things. I’ve never yet found an ideological, purity-driven customer decision around that sort of thing.

What they care about is, are there multiple vendors who can provide this thing so I’m not going to be using a commercially licensed database that can arbitrarily start playing games with seat licenses and wind up distorting my cost structure massively with very little notice. Does that align with your—

Tony: Yeah.

Corey: Understanding of what people are talking about when they say that, or am I missing something fundamental? Which is again, always possible?

Tony: No, I think you’re onto something there. Open-source is a whole other can of worms, and I’ve burned many, many brain cells over this one as well. And today, you’re seeing a lot of pieces about the, you know, the—that are basically giving eulogies for open-source. It’s—you know, like HashiCorp just finally changed its license and a bunch of others have in the database world. What open-source has meant is been—and I think for practitioners, for DBAs and developers—here’s a platform that’s been implemented by many different vendors, which means my skills are portable.

And so, I think that’s really been the key to why, for instance, like, you know, MySQL and especially PostgreSQL have really exploded, you know, in popularity. Especially Postgres, you know, of late. And it’s like, you look at Postgres, it’s a very unglamorous database. If you’re talking about stodgy, it was born to be stodgy because they wanted to be an adult database from the start. They weren’t the LAMP stack like MySQL.

And the secret of success with Postgres was that it had a very permissive open-source license, which meant that as long as you don’t hold University of California at Berkeley, liable, have at it, kids. And so, you see, like, a lot of different flavors of Postgres out there, which means that a lot of customers are attracted to that because if I get up to speed on this Postgres—on one Postgres database, my skills should be transferable, should be portable to another. So, I think that’s a lot of what’s happening there.

Corey: Well, I do want to call that out in particular because when I was coming up in the naughts, the mid-2000s decade, the lingua franca on everything I used was MySQL, or as I insist on mispronouncing it, my-squeal. And lately, on same vein, Postgres-squeal seems to have taken over the entire universe, when it comes to the de facto database of choice. And I’m old and grumpy and learning new things as always challenging, so I don’t understand a lot of the ways that thing gets managed from the context coming from where I did before, but what has driven the massive growth of mindshare among the Postgres-squeal set?

Tony: Well, I think it’s a matter of it’s 30 years old and it’s—number one, Postgres always positioned itself as an Oracle alternative. And the early years, you know, this is a new database, how are you going to be able to match, at that point, Oracle had about a 15-year headstart on it. And so, it was a gradual climb to respectability. And I have huge respect for Oracle, don’t get me wrong on that, but you take a look at Postgres today and they have basically filled in a lot of the blanks.

And so, it now is a very cre—in many cases, it’s a credible alternative to Oracle. Can it do all the things Oracle can do? No. But for a lot of organizations, it’s the 80/20 rule. And so, I think it’s more just a matter of, like, Postgres coming of age. And the fact is, as a result of it coming of age, there’s a huge marketplace out there and so much choice, and so much opportunity for skills portability. So, it’s really one of those things where its time has come.

Corey: I think that a lot of my own biases are simply a product of the era in which I learned how a lot of these things work on. I am terrible at Node, for example, but I would be hard-pressed not to suggest JavaScript as the default language that people should pick up if they’re just entering tech today. It does front-end, it does back-end—

Tony: Sure.

Corey: —it even makes fries, apparently. There’s a—that is the lingua franca of the modern internet in a bunch of different ways. That doesn’t mean I’m any good at it, and it doesn’t mean at this stage, I’m likely to improve massively at it, but it is the right move, even if it is inconvenient for me personally.

Tony: Right. Right. Put it this way, we’ve seen—and as I said, I’m not an expert in programming languages, but we’ve seen a huge profusion of programming languages and frameworks. But the fact is that there’s always been a draw towards critical mass. At the turn of the millennium, we thought is between Java and .NET. Little did we know that basically JavaScript—which at that point was just a web scripting language—[laugh] we didn’t know that it could work on the server; we thought it was just a client. Who knew?

Corey: That’s like using something inappropriately as a database. I mean, good heavens.

Tony: [laugh]. That would be true. I mean, when I could have, you know, easily just use a spreadsheet or something like that. But so, I mean, who knew? I mean, just like for instance, Java itself was originally conceived for a set-top box. You never know how this stuff is going to turn out. It’s the same thing happen with Python. Python was also a web scripting language. Oh, by the way, it happens to be really powerful and flexible for data science. And whoa, you know, now Python is—in terms of data science languages—has become the new SaaS.

Corey: It really took over in a bunch of different ways. Before that, Perl was great, and I go, “Why would I use—why write in Python when Perl is available?” It’s like, “Okay, you know, how to write Perl, right?” “Yeah.” “Have you ever read anything a month later?” “Oh…” it’s very much a write-only language. It is inscrutable after the fact. And Python at least makes that a lot more approachable, which is never a bad thing.

Tony: Yeah.

Corey: Speaking of what you touched on toward the beginning of this episode, the idea of databases not being sentient, which I equate to being self-aware, you just came out very recently with a report on generative AI and a trip that you wound up taking on this. Which I’ve read; I love it. In fact, we’ve both been independently using the phrase [unintelligible 00:19:09] to, “English is the new most common programming language once a lot of this stuff takes off.” But what have you seen? What have you witnessed as far as both the ground truth reality as well as the grandiose statements that companies are making as they trip over themselves trying to position as the forefront leader and all of this thing that didn’t really exist five months ago?

Tony: Well, what’s funny is—and that’s a perfect question because if on January 1st you asked “what’s going to happen this year?” I don’t think any of us would have thought about generative AI or large language models. And I will not identify the vendors, but I did some that had— was on some advanced briefing calls back around the January, February timeframe. They were talking about things like server lists, they were talking about in database machine learning and so on and so forth. They weren’t saying anything about generative.

And all of a sudden, April, it changed. And it’s essentially just another case of the tail wagging the dog. Consumers were flocking to ChatGPT and enterprises had to take notice. And so, what I saw, in the spring was—and I was at a conference from SaaS, I’m [unintelligible 00:20:21] SAP, Oracle, IBM, Mongo, Snowflake, Databricks and others—that they all very quickly changed their tune to talk about generative AI. What we were seeing was for the most part, position statements, but we also saw, I think, the early emphasis was, as you say, it’s basically English as the new default programming language or API, so basically, coding assistance, what I’ll call conversational query.

I don’t want to call it natural language query because we had stuff like Tableau Ask Data, which was very robotic. So, we’re seeing a lot of that. And we’re also seeing a lot of attention towards foundation models because I mean, what organization is going to have the resources of a Google or an open AI to develop their own foundation model? Yes, some of the Wall Street houses might, but I think most of them are just going to say, “Look, let’s just use this as a starting point.”

I also saw a very big theme for your models with your data. And where I got a hint of that—it was a throwaway LinkedIn post. It was back in, I think like, February, Databricks had announced Dolly, which was kind of an experimental foundation model, just to use with your own data. And I just wrote three lines in a LinkedIn post, it was on Friday afternoon. By Monday, it had 65,000 hits.

I’ve never seen anything—I mean, yes, I had a lot—I used to say ‘data mesh’ last year, and it would—but didn’t get anywhere near that. So, I mean, that really hit a nerve. And other things that I saw, was the, you know, the starting to look with vector storage and how that was going to be supported was it was going be a new type of database, and hey, let’s have AWS come up with, like, an, you know, an [ADF 00:21:41] database here or is this going to be a feature? I think for the most part, it’s going to be a feature. And of course, under all this, everybody’s just falling in love, falling all over themselves to get in the good graces of Nvidia. In capsule, that’s kind of like what I saw.

Corey: That feels directionally accurate. And I think databases are a great area to point out one thing that’s always been more a little disconcerting for me. The way that I’ve always viewed databases has been, unless I’m calling a RAND function or something like it and I don’t change the underlying data structure, I should be able to run a query twice in a row and receive the same result deterministically both times.

Tony: Mm-hm.

Corey: Generative AI is effectively non-deterministic for all realistic measures of that term. Yes, I’m sure there’s a deterministic reason things are under the hood. I am not smart enough or learned enough to get there. But it just feels like sometimes we’re going to give you the answer you think you’re going to get, sometimes we’re going to give you a different answer. And sometimes, in generative AI space, we’re going to be supremely confident and also completely wrong. That feels dangerous to me.

Tony: [laugh]. Oh gosh, yes. I mean, I take a look at ChatGPT and to me, the responses are essentially, it’s a high school senior coming out with an essay response without any footnotes. It’s the exact opposite of an ACID database. The reason why we’re very—in the database world, we’re very strongly drawn towards ACID is because we want our data to be consistent and to get—if we ask the same query, we’re going to get the same answer.

And the problem is, is that with generative, you know, based on large language models, computers sounds sentient, but they’re not. Large language models are basically just a series of probabilities, and so hopefully those probabilities will line up and you’ll get something similar. That to me, kind of scares me quite a bit. And I think as we start to look at implementing this in an enterprise setting, we need to take a look at what kind of guardrails can we put on there. And the thing is, that what this led me to was that missing piece that I saw this spring with generative AI, at least in the data and analytics world, is nobody had a clue in terms of how to extend AI governance to this, how to make these models explainable. And I think that’s still—that’s a large problem. That’s a huge nut that it’s going to take the industry a while to crack.

Corey: Yeah, but it’s incredibly important that it does get cracked.

Tony: Oh, gosh, yes.

Corey: One last topic that I want to get into. I know you said you don’t want to over-index on AWS, which, fair enough. It is where I spend the bulk of my professional time and energy—

Tony: [laugh].

Corey: Focusing on, but I think this one’s fair because it is a microcosm of a broader industry question. And that is, I don’t know what the DBA job of the future is going to look like, but increasingly, it feels like it’s going to primarily be picking which purpose-built AWS database—or larger [story 00:24:56] purpose database is appropriate for a given workload. Even without my inappropriate misuse of things that are not databases as databases, they are legitimately 15 or 16 different AWS services that they position as database offerings. And it really feels like you’re spiraling down a well of analysis paralysis, trying to pick between all these things. Do you think the future looks more like general-purpose databases, or very purpose-built and each one is this beautiful, bespoke unicorn?

Tony: [laugh]. Well, this is basically a hit on a theme that I’ve been—you know, we’ve been all been thinking about for years. And the thing is, there are arguments to be made for multi-model databases, you know, versus a for-purpose database. That being said, okay, two things. One is that what I’ve been saying, in general, is that—and I wrote about this way, way back; I actually did a talk at the [unintelligible 00:25:50]; it was a throwaway talk, or [unintelligible 00:25:52] one of those conferences—I threw it together and it’s basically looking at the emergence of all these specialized databases.

But how I saw, also, there’s going to be kind of an overlapping. Not that we’re going to come back to Pangea per se, but that, for instance, like, a relational database will be able to support JSON. And Oracle, for instance, does has some fairly brilliant ideas up the sleeve, what they call a JSON duality, which sounds kind of scary, which basically says, “We can store data relationally, but superimpose GraphQL on top of all of this and this is going to look really JSON-y.” So, I think on one hand, you are going to be seeing databases that do overlap. Would I use Oracle for a MongoDB use case? No, but would I use Oracle for a case where I might have some document data? I could certainly see that.

The other point, though, and this is really one I want to hammer on here—it’s kind of a major concern I’ve had—is I think the cloud vendors, for all their talk that we give you operational simplicity and agility are making things very complex with its expanding cornucopia of services. And what they need to do—I’m not saying, you know, let’s close down the patent office—what I think we do is we need to provide some guided experiences that says, “Tell us the use case. We will now blend these particular services together and this is the package that we would suggest.” I think cloud vendors really need to go back to the drawing board from that standpoint and look at, how do we bring this all together? How would he really simplify the life of the customer?

Corey: That is, honestly, I think the biggest challenge that the cloud providers have across the board. There are hundreds of services available at this point from every hyperscaler out there. And some of them are brand new and effectively feel like they’re there for three or four different customers and that’s about it and others are universal services that most people are probably going to use. And most things fall in between those two extremes, but it becomes such an analysis paralysis moment of trying to figure out what do I do here? What is the golden path?

And what that means is that when you start talking to other people and asking their opinion and getting their guidance on how to do something when you get stuck, it’s, “Oh, you’re using that service? Don’t do it. Use this other thing instead.” And if you listen to that, you get midway through every problem for them to start over again because, “Oh, I’m going to pick a different selection of underlying components.” It becomes confusing and complicated, and I think it does customers largely a disservice. What I think we really need, on some level, is a simplified golden path with easy on-ramps and easy off-ramps where, in the absence of a compelling reason, this is what you should be using.

Tony: Believe it or not, I think this would be a golden case for machine learning.

Corey: [laugh].

Tony: No, but submit to us the characteristics of your workload, and here’s a recipe that we would propose. Obviously, we can’t trust AI to make our decisions for us, but it can provide some guardrails.

Corey: “Yeah. Use a graph database. Trust me, it’ll be fine.” That’s your general purpose—

Tony: [laugh].

Corey: —approach. Yeah, that’ll end well.

Tony: [laugh]. I would hope that the AI would basically be trained on a better set of training data to not come out with that conclusion.

Corey: One could sure hope.

Tony: Yeah, exactly.

Corey: I really want to thank you for taking the time to catch up with me around what you’re doing. If people want to learn more, where’s the best place for them to find you?

Tony: My website is And on my homepage, I list my latest research. So, you just have to go to the homepage where you can basically click on the links to the latest and greatest. And I will, as I said, after Labor Day, I’ll be publishing my take on my generative AI journey from the spring.

Corey: And we will, of course, put links to this in the [show notes 00:29:39]. Thank you so much for your time. I appreciate it.

Tony: Hey, it’s been a pleasure, Corey. Good seeing you again.

Corey: Tony Baer, principal at dbInsight. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that we will eventually stitch together with all those different platforms to create—that’s right—a large-scale distributed database.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit to get started.
Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.