It’s like a HeatWave, Burning in my Heart with Nipun Agarwal

Episode Summary

Nipun Agarwal has been a guest before on “Screaming,” but now he has graduated up to Senior VP at Oracle! Which means that Corey can throw harder questions Nipun's direction. Now that Nipun is a SVP, he is well suited to take them on, and we’re the better for it as he has much to say about what Oracle has been up to. Nipun describes in detail some of Oracle’s more recent innovations, including MySQL HeatWave. Which if you’ll allow a terrible descriptor is MySQL with some “magic layered on top of it.” Nipun, however, is able to tell us exactly what the magic is and how it works. Nipun dives into the details as he and Corey discuss MySQL and HeatWave when it comes to processing, analytics, machine learning, moving data around and costs, its competition with other databases, the value it brings to customers, and much more!

Episode Show Notes & Transcript

About Nipun
Nipun Agarwal is a Senior Vice President, MySQL HeatWave and Advanced Development, Oracle. His interests include distributed data processing, machine learning, cloud technologies and security. Nipun was part of the Oracle Database team where he introduced a number of new features. He has been awarded over 170 patents.


Links:

Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.


Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.


Corey: Welcome to Screaming in the Cloud, I’m Corey Quinn. Today’s promoted episode is a returning guest with a slight difference. When last we spoke, Nipun Agarwal was a VP over at Oracle, but now—that’s right. When people stay in a company long enough and perform well, they wind up getting additional adjectives in lieu of other things—Nipun, you’re now a Senior VP over at Oracle. Congratulations, I think, unless that just means you’ve gotten older. Welcome back.


Nipun: Thank you, Corey.


Corey: So, now that you’re at SVP level, I can ask some of the harder questions that we didn’t necessarily—seem fair to get into the last time we spoke, such as what is an Oracle, and what might they do these days? For folks who have, I don’t know, been living in a cave for 40 years.


Nipun: Corey, glad to be back on your show. And since the last time we spoke, we have had, like, you know, a lot of enhancements and innovations, and I’ll be happy to describe those in detail whenever is a good time.


Corey: Absolutely so you’ve been focused on MySQL for a very long time. And you’ve been using it so long, I really should be calling it YourSQL, but that’s neither here nor there. And you’ve also been focusing on HeatWave, which is effectively MySQL with then some—I’m just going to cheat and call it magic that is layered on top of it. That is probably a terrible descriptor of what it actually does, but understand I’m coming from a perspective where I firmly believe the best database in the world is Amazon Route 53, which is a DNS server, so people look at that and say, ‘well, that’s not really what it’s designed to do,’ which really sounds like a ‘them’ problem. And fair enough. We’re going to invert it here. So, why is HeatWave a terrible DNS server? What is it exactly?


Nipun: So, MySQL is the most popular database in the world—it’s the most popular open-source database in the world—lots of people use it. All the major cloud vendors, they take the MySQL database, and either as is or, like, you know, with some enhancements, they offer a managed service, whether it’s Amazon, Azure, Google, pretty much all the major cloud vendors. Now, MySQL has been designed and optimized for transaction processing, so it does a great job for transaction processing. But when customers need to run complex queries or when they need to run analytics, customers would have to take the data out of the MySQL database into some other database for running analytics.


Corey: Let me make sure I understand your terms properly. When you say ‘transactional,’ you’re talking about I’m shopping for underpants on a website. I go ahead and make a purchase; that’s considered a transaction, and a database change reflecting my purchase makes sense. From an analytics perspective, you’re like, “All right, let’s see who bought underpants during this time period.” It’s effectively, usually, a small individual record versus now we’re going to start doing deep dives into effectively a lot of those records in aggregate, is that directionally correct, or is my understanding more than a little flawed about things beyond DNS?


Nipun: Right. What you describe is very accurate. That transaction processing is about point queries making frequent changes, whereas when we talk about analytics, it typically involves scanning a much larger amount of data to get the results, and aggregations is a very good example of that.


Corey: So historically, that seems that people have used very different tooling for different sides of those. Ideally—I admit, back in the bad old days when I was a systems administrator, we were running MySQL a fair bit, and we had the primary database, which was the thing that handled all of the live transactions and the rest, and whenever we ran business reporting queries on it, it’s like, “Huh, why is the website super slow?” And it didn’t seem to work very well. Now, back then, at the scale we were operating at the solution was, “Ah, we’re going to use a replica, and then we’re going to basically beat the crap out of the replica for our reporting queries.” And if that gets a little slow and bogged down, who cares? Well, just other people running reporting queries; people can still buy underpants.


So, that was the way that we handled it back then. This was a decade ago. Data sets have gotten significantly larger since then, and apparently, my way of viewing it is, as they say, quaint when they’re trying not to be actively insulting. The right way to do it these days is to have completely separate systems that wind up handling those queries with different user interfaces by and large. That is, to my understanding, the rise of ‘Big Data,’ and you can hear the initial caps in Big Data with people talk about it like that.


Nipun: Correct. So, what you describe is absolutely correct that people would extract the data out of databases, take it to specialized databases, which are [apt 00:05:11] for running decision-making analytic processing. But the downside is that a people need to express the logic and write code to extract this data, and then customers end up with these two different databases. They got to keep the data in sync, they got to move the data periodically. So, there are a lot of, like, you know, issues in terms of having to manage two different databases, one for transaction processing, one for analytics.


What we have done with HeatWave is to enhance the MySQL database service in the Oracle Cloud so that now the single MySQL database is optimized both for transaction processing as well as analytics. So, now you have a single database. And whether you want to run point queries or these aggregate queries, you can do it on the same data. So, the data remains as is. You’re bringing richness of computation, richness in query processing, to the customers.


Corey: One of the truisms of cloud is that it forces a reevaluation, in many cases, of things that people historically hadn’t had to think about it. A classic example when I was consulting on cloud migrations, was building up costing models, as you might imagine. And my customers would ask me questions, such is, “Great. So, what’s this going to cost us?” And I would come back with, “Well, okay, how many gigabytes in a given month does transfer between this database and that other database, you know, in the machine sitting right next to it?” And their response started off with a, “Why on earth do you think we would know that?” Followed by, “Wait, why do we need to know 
that?” Followed by, “Oh, God. It costs us to do what?”


And very quickly an architectural pattern has emerged within cloud of—you know, people experience this the second time, they plan for it. And as a result, whatever database is the most cost-effective is the one that data is already in because moving data from point to point is inherently an expensive proposition. Depending on where the second point is, it can be an extortionately expensive proposition. Which means that very often, we’ll start to see patterns that are, I guess, sacrificing one side of the database interaction model or the other, that transactions are going to be a little slower because you need to have it in the same place you’re going to be running large scale analytics on, or alternately, analytics are going to be super crappy, just because you have to wind up querying systems during downtimes and low periods. It just becomes a giant mess, regardless of whether it’s bad in one way, bad in another, or just expensive, it hasn’t worked for people. And my sense is that that is what HeatWave is directly aimed.


Nipun: Yes. Indeed. So, there are multiple reasons why HeatWave is being so successful. One is the case that okay, customers need a single database, instead of having multiple. The second thing is, there is absolutely no change required to MySQL applications, so the MySQL applications or MySQL compatible applications work as-is with this query [unintelligible 00:08:10] HeatWave without any change.


But the third reason why this is so popular is that HeatWave has been designed from the ground up for scalability, performance, and optimized for the underlying gear, which is the underlying cloud platform. As a result, it offers a very good price-performance compared to any of the service we have run against. So, not only is it providing the benefits of having a single database, no change to the application, but also it is extremely fast and low price. And that’s because a lot of technology innovations we did, like, almost like, over a decade to build this, scale our system for analytic processing, which has been optimized for the underlying cloud [commodity 00:08:55] gear.


Corey: So, help me understand. Is HeatWave a, effectively, reengineering of MySQL? Is it a completely separate layer that exists distinct from an existing MySQL database? Or is it something else entirely?


Nipun: So, we started off designing HeatWave separately as something ground up, which came out of many years of research and advanced developing. And once we knew that we could scale up HeatWave for analytic processing, and it is very well optimized for the underlying hardware and such. Then we did the work of enhancing the MySQL database so that it can be integrated, right? So yes, it started off as a standalone effort from the ground up so that we didn’t have to, you know, [live 00:09:38] any constraints of any existing codebase, so we could design it and optimize it right from the ground up to be the best possible. But then we integrated this thing with the MySQL database so that the customers can use it without requiring any change to the application in terms of the semantics or any new syntax, right? So, there’s absolutely no new syntax and no change to the semantics for existing MySQL applications. So, it gives you best of both worlds.


Corey: So, this has frequently been described in the context of a competitor to very—again, forgive the Amazonian focus; that’s where I spend most of my time, usually complaining about things—but it’s been positioned in some ways as a competitor to things such as RDS or Aurora, as well as Redshift, or Snowflake if we’re stepping slightly outside that ecosystem. The challenge that I keep running into, very often, is that when I talk to customers using those systems—and yes, those systems invariably show up on the bill as one of the big numbers, regardless of how you slice it—it feels like their use case for each of those is very different, it feels very much like half of those are aimed at purely transactional and half of them are aimed at the data warehousing story, the large amounts of data for analytics queries. And my default knee-jerk reaction, whenever someone says, “Ah, we built a thing that does both of those super well,” it’s, “Yeah, I’ve heard this before, it was the HP multifunction printer where it does three things, none of them well.” And no one has a multifunction printer that they liked for the longest time—because it’s moving parts and computers and the devil in equal measure—and it’s okay, so you’re trying to build something that stands between two worlds, but it’s easy to come away with the conclusion, as a result, that it’s not the best of breed for either use case, but rather a series of trade-offs or compromises that are made to enable both use cases. I get the sense that that is not your impression of what you’ve built.


Nipun: Correct. And I’ll give you a data point for that. In the data point is—


Corey: Yay. Data I love that. As opposed to your opinion is bad because my opinion is good. No, no, coming with data is a great approach. Please continue.


Nipun: [laugh]. In terms of the customers who are using or adopting MySQL HeatWave, one of the largest segments of the customers who are migrating their production workloads from other databases or other services and coming to HeatWave are AWS customers who are migrating their production workloads from RDS or Aurora and are going production with MySQL HeatWave. So, the fact that the customers are doing that is an evidence that there is some value to it. And the reasons they are doing it is absolutely no change to their application, it is faster, it is cheaper. Now, in addition, what they find is that many of these customers were moving their data from Aurora or RDS into Redshift or Snowflake for analytics. They don’t need to do that, right, and that’s an additional savings they get.


But we have a lot of evidence that existing customers have MySQL-based services—definitely AWS, but even on other clouds—and Aurora are migrating, and that’s very encouraging for us that, hey, we should be doing something right for customers to want to migrate their workloads to MySQL HeatWave.


Corey: You had a couple of announcements coming out about what’s new and what’s coming to HeatWave, and one of the ones that we’re talking about today is the idea of elasticity. Something you just said reminds me of a couple years ago when Amazon had relatively recently brought out Aurora and they said much the same thing of, “Oh, it’s super-elastic. You don’t have to take it down to make it bigger.” And it’s great. Well, you just talked about people removing data as they migrate somewhere else, and the question I had at the time was, “Okay, great. So, that’s how the database embiggens. That’s great. How does it emsmallen? Does that wind up having that same elastic property?”


And the response was very defensive, “Well, why would someone ever do that? Data only gets bigger.” And it’s, yeah, well, you haven’t worked with me in production where I accidentally drop a table now and again, and data does get smaller. And the answer for the longest time there was elasticity and auto-scaling was basically unidirectional because that’s what customers are asking for. Right. So, I have to ask, when you say elasticity around HeatWave, is that unidirectional, or does it mean that oh, now there’s less data, so we’re going to go back down again.


Nipun: It is bidirectional, so customers can upsize or they can downsize. Now, I have to say that HeatWave is a highly scalable system. And what that means is that as customers add more nodes to the cluster, the performance of the system improves almost linearly with the number of nodes which have been added. So, as a result, we have a lot of customers who start with a cluster size of certain number, and based on the workloads, they either add nodes or they reduce the number of nodes, right? So, it’s a very common operation; people want to scale up and scale down.


And with the real-time elasticity feature we have introduced, customers can do either operation and with absolutely no downtime. There’s absolutely no time when the cluster is not available for queries or for DMLs, right? So, while the resize operation is going on, this cluster is fully available and customers can upsize to a number of nodes and downsize to any number of nodes.


Corey: As it scales in or scales out, is that effectively doing its own internal sharding and rebalancing of data under the hood, invisible to customers? Is there something else going on? Like, how does this work?


Nipun: Right. So, take the example that customer has, say, four nodes and they want to add two more notes. There are couple of interesting properties over here. We have a technical super-partitioning, by which we know exactly which are the blocks of data which have to be populated to the new nodes which have been added. However, one of the key design points of our elasticity is that there is no data movement between the nodes.


So, all the data which has to be populated in the new nodes which are being added is fetched from the object store, the [OCI 00:15:50] object store. As a result, the existing cluster of four nodes is working as is, queries are working as is, without any degradation in performance. When the data has been populated to these additional nodes, the system then starts having the queries execute on the larger cluster. So, the smaller cluster is available all the time, then the larger clusters available, so from a user’s perspective, they see absolutely no downtime. And since there is no data movement happening from the initial four nodes, there is no degradation of the existing queries which will be running on the older cluster.


Corey: It’s 2022 and you’re announcing enhancements to a technology, so of course, it is a given that you are now talking as well about machine learning. Now, in a general sense, whenever someone says that my immediate instinctive reaction is to check my wallet in case someone is in the middle of picking my pocket because it seems like it winds up in some very weird places. What is machine learning and its applicability to HeatWave? Because generally speaking, when I look at things you can use machine learning for the answer is often finding signal from noise in large datasets and, of course, the ever-popular bias laundering. But I get the sense that neither one of those is quite what you’re talking about here. What monstrosity have you built?


Nipun: With MySQL HeatWave, customers are bringing in more data from either consolidating multiple MySQL databases into one, bringing workloads from other database into MySQL, but the volume of data which now customers are putting into MySQL HeatWave is growing because they want to run transaction processing, analytics all together in one database. Now, as the size of the data is growing, we are finding that many customers want to extract the data or currently need to extract the data out of the MySQL database to run machine-learning processing. So, some of the very large customers of MySQL HeatWave have been using HeatWave very successfully for transaction processing and analytics, but they had to extract the data out to some other ecosystem, to some other service for machine-learning processing. With the announcement we have made, which is HeatWave ML, we are now providing in-database support for machine learning, meaning that customers of MySQL HeatWave can do training, inference, as well as explanations, all inside MySQL HeatWave, without the data or the model ever having to leave MySQL.


And this is something which is fairly unique. Apart from the Oracle database, I’m not aware of any other database, which provides in-database machine-learning capabilities, and certainly not as rich, right, which is very efficient training, inference, and explanations. And all models which are created by HeatWave ML inside MySQL HeatWave can be explained, which is a pretty important capability which enterprise customers like to have.


Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They’ve also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That’s S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.


Corey: What does this wind up empowering customers to do? Give an example or two, just because it’s easy to talk about this stuff in the abstract as far as, “Oh, it would theoretically let someone do X, Y or Z.” But the problem I found, generally speaking, in the world of machine learning is that it is challenging to articulate it in a way that people hear the story and think, “Hey, that looks like something I might want to do.” As opposed to the common stories are, “Well, if you have a world-spanning data set and want to do this, this, and this”—like, “Well, I don’t. And I don’t and I don’t and I don’t, so what value is it to me?” What capabilities does it unlock?


Nipun: Right. So, with the introduction of HeatWave, what we had said is that customers don’t need multiple databases: One for transaction processing, one for analytics; they can do both transactional processing and analytics with one database, right? That’s what we started off with. Now, the same thing holds true for machine learning. Current customers of most databases need to extract data out of the database for doing machine learning.


And we are saying, “Hey, that’s not [unintelligible 00:19:45], analytics, mixed workloads, or machine learning. Your data can all be inside MySQL, MySQL HeatWave, and you can do all the processing with that service.” Now, the kinds of capabilities customers like to have for machine learning, training as the most important one. And training is a very time-consuming operation. And typically when customers do training and they’re using some other service, it’s time-consuming and it is very expensive as well.


One of the very interesting properties here is that when you’re running machine learning inside HeatWave, you don’t need to provision any additional cluster, or you don’t need to have any custom gear. This machine-learning training is happening on the same cluster which the user has provisioned for analytics or for transaction processing. So, on the same hardware, on the same cluster, now they can run machine-learning processing. So, the kind of use case which you’re asking is when customers have this data—and I’ll walk you through an example. Take the case of credit card, right?


If a bank wants to determine whether they want to, like, deny someone a credit card or approve it, it’s based on some characteristics. Many of the times, people use a rule-based mechanism, but now with data-driven approaches, people want to look at a lot of data and the system makes a recommendation that yes, this person is appropriate for, like, you know, granting the loan or not. And this is something for which customers—or, like, the enterprises want to have rich models which accurately provide a characterization of the data so that they can make the right predictions. So, training is very important because you want to get the training be done right on the data because it influences the quality of the predictions which are being made. And once a prediction is made, there may be reasons, like, there could be regulatory compliance reasons because of which the enterprise may need to offer an explanation that why was the credit card denied, just to kind of make sure that there wasn’t any bias or unfairness.


And that’s where machine-learning explanation capabilities are also very helpful. So, this is an example: when someone goes for apply for a credit card, whether it’s rejected or approved. Another example is that when someone is making a call, like a marketing team is making a call, and the system want to predict that will a call lead to a successful outcome or not. That’s another example. So, machine learning is being used very—now—extensively, and one of the advantages of a database is a database is where there’s a lot of data, so it’s a very, very good opportunity to harness this data using machine learning. Because machine learning is really tied to the richness of data and to the amount of data someone has.


Corey: That makes a lot of sense. So, it’s… it definitely shines a light at a, if not the easy answer for a lot of those questions, a directions that are people are going to have a better time of mapping to their specific use cases. One that I think is easier for everyone to map to a specific use case is another component of what you folks are announcing which is cost reduction, which is, to be direct, not something people generally think of Oracle as the first example of. A company that’s like, “Ah, that’s the thing that’s going to cost me less money.” And to be clear, I have no problem with that. I pride myself on absolutely not being the least expensive answer to basically anything. But it is an interesting direction to go in. There are a few ways you can wind up saving folks money. Which path have you folks taken?


Nipun: Now, there are multiple ways in which we can reduce the cost for the customer. So, one thing to realize it is MySQL customers are very cost-sensitive. And in the previous benchmarks and results we have shown, we have shown that, you know, compared to other vendors, we are significantly faster—that HeatWave significantly faster and significantly cheaper. So, we have class of customers come to us saying, “Hey, you know what? Can you trade-off some performance for even lower cost?”


And the way we have done is the following: We have doubled the amount of data which can be processed on a HeatWave node. So, HeatWave is an in-memory system, so the size of the cluster depends upon the amount of data which is being processed. And it depends upon the amount of data which can be processed per node. So, if you double the amount of data that can be processed per node, it means that now customers need a cluster half the size compared to what they were doing in the past, which reduces their cost by half. Now, please note, when they’re running on a cluster half the size, the amount of time it takes to run the same query will double.


So, what it means is, the system is providing the same price-performance because half the cost, double the time. But it’s a choice that customers have. If they still want to get the same performance [unintelligible 00:24:38] earlier, they can continue to run on the larger cluster, but now they have a choice. So, in a way, we are providing an even lower entry point for customers. That’s the first part of cost savings.


Corey: And that makes sense because with a lot of the workloads you see where it’s nice to be able to run analytics on the same type of data, you don’t need the same level of responsiveness on a lot of those queries either, where it’s, “So, we’re trying to get an answer to this giant analytics query.” “Okay, so great. How quickly do you need it working?” When transactions are measured in fractions of a second, the answer to analytics queries is, “Well, Tuesday would be nice. We’d like it by Tuesday if you can find a way to pull that off.”


So, there’s no reason to pay for near-line-rate speeds if you don’t need it for a lot of those queries, which is absolutely going to be an interesting option for folks. Now, you said there was a second aspect as well.


Nipun: Yes. And the second aspect is, again, for analytics, right? Customers want to run the queries, they want to run it occasionally, they don’t want to run it all the time, so what we are now introducing is a feature called ‘Pause and Resume.’ And what it does is that if you’re not using the cluster, you can pause and the system makes a copy of the data and all the metadata associated with the data in a backup, and when the user wants, they can resume and, like, you know, fetch the data, which is still in the in-memory presentation and all the metadata associated with Autopilot. And just resume, right? So, this is another way by which customers when they’re not using the cluster for some duration time, they can pause it, and for the duration they pause it, they’re not being charged.


Corey: I am a big believer of the number one step of cloud economics is like, “Oh, should I buy it some reservations or lock into long-term contract?” “No. You should turn things off when you’re not using them.” And people look at you strange, and say, “What? You can turn things off?” And yes, you absolutely can, which makes people feel better about generally not doing it.


But again, customer behaviors are usually ones that makes sense in their context. I just look at from a billing perspective, and it seems a little weird. I like the option, particularly for things that are either non-production or only going to be relevant to production during certain time windows, there are a number of areas where that begins to make an awful lot of sense, and people would do it if it didn’t require backing up the database, destroying the cluster, then re-provisioning the database restoring the cluster. And, yeah, people don’t generally have weeks to spend on spin-up and spin-down.


Nipun: Yes, in fact, that’s a very, very good observation, Corey. I want to say that many of our customers who are running their production workloads and HeatWave, they also have a test environment. And exactly on the lines of what you said, that they want to have a copy of the data in the test environment, should something bad happen, but they don’t want the cluster on all the time. They just wanted for some duration of time and for them, this pause and resume will be a very good idea. And also like, you know, save them money. So, something which we have seen with many of our customers.


Corey: The last component of your announcement is one that I approach with a significant amount of skepticism because every time I start drifting in this direction, one thing is for certain: It’s that I’m going to get yelled at on the internet. I’m referring, of course, to benchmarking. Now, Oracle historically has been a company that prefers people not benchmark and publish results of those benchmarks, backdating into the mists of history. And the argument has always been that people don’t generally tend to benchmark database workloads appropriately, due to a series of misunderstandings, and let’s be clear, this stuff is complicated. And a number of companies in the space love to talk about their benchmarks are great, and when you look into it, it’s okay, those numbers are great.


And you sort of know that the benchmarks that didn’t perform so well are not the ones that they’re talking about. And then their competitor immediately winds up chiming in, where it’s, “Ah, they’re doing it wrong because when you do these other benchmarks, our solution winds up being better.” And it winds up in a nerd slap-fight that no one, even the participants, particularly enjoy. What makes your benchmarks interesting is that you talk through not just what the benchmark results are—because, of course, that’s the entire point—you’re also putting the benchmark methodology and tooling up on GitHub where people can grab it and run it themselves, and see for yourself is the entire approach. That is—how do I put this politely—that is atypical of large companies in general and Oracle in particular. What changed?


Nipun: Right. So, there are three things over here, Corey, right? The first thing is, as we talked about, MySQL is the most popular open-source database in the world. Pretty much all cloud vendors, they have some version of MySQL which they’re offering as a managed service, and in many cases, they’re enhancing MySQL and then offering their service. So, in the context of MySQL, it becomes very important for us to give the opportunity to our customers, for them to compare which service is better for their needs.


So, is more important in the context of MySQL, since everyone is offering it and some of them have derivatives, that we provide some mechanism for people to compare. So, that’s the task for having a benchmark. That’s the first point. Second thing is when you want to compare the performance or the cost of these, like, you know, various flavors, instead of us coming with our own, say, workloads which you see from customers, it’s good to have a well-published benchmark, a well-understood benchmark, so that people can say, “Okay, you know what? Based on TPC-H, what is the performance?” Or, “On [TPC DS 00:29:54], what is the performance?”


In some cases, when a benchmark isn’t available, what we have done is for machine learning, we have used a bunch of open datasets and based on those open datasets, we are publishing the benchmarks to say, “Hey, we are so much faster or so much cheaper.”


And then the third aspect is in terms of why we are making them all available in GitHub or open-source. That these benchmarks are a starting point, but customers will have workloads which are different from these benchmarks, so we want to provide the opportunity for the customers to first look at what is our methodology, what have we used to come up with these numbers so they can reproduce them, but, B, if their workloads are different, they can enhance or augment these benchmarks in the way they would like, and then run them to see how they compare, right? So, we want to be fully transparent about what we have done, how we have done, and let customers decide on their own which is going to be the best platform from a cost perspective, from a performance perspective. So, this is the reason why we have chosen to benchmark and GitHub, like, make available all over scripts in the open-source.


Corey: One of the things I think I admire the most about that is I’ve always viewed benchmarks as being borderline worthless because I do not care in the slightest how your system performs on hand-selected ratings on sample data that you provide, whereas I care everything for how the system performs with my workloads and my data sets. So, unless I am talking to someone who is effectively a neutral third-party benchmark source, in which case they are immediately attacked for being shills for one company or another, and sometimes both or neither at the same time because people are terrible, but seeing how it runs on my workloads and with my constraints is the important and valuable thing. And this is the easiest I can ever see it being for getting a good representative feel for exactly how different offerings are going to perform under the specific conditions that my production environment lives within. Because it’s me we’re talking about the specific conditions of my production environment are, of course, terrifying.


Nipun: Right. So, I want to point out, yes, one is the fact that we have made these benchmarks methodology, like, you know, very transparent, but the second aspect of that is what we talked about last time, which is MySQL Autopilot, right? This is machine-learning-based automation, data-driven-based automation. So, we are very actively working on making it easy for customers to not have to do any configuration changes or optimizations; that the system determines, based on the queries, based on the workloads, how to best tune the system, right?


So, we are working in both angles: One is to make the system more intelligent, so that based on the workload, the system can optimize for the users workload, and then, B, making our approach very transparent so that customers can compare for themselves. So, we are very, very aware of this, and again, for MySQL customers, for many of these open-source customers, simplicity is very important and we are working hard to make it simpler and transparent to our users.


Corey: I really want to thank you for taking me on a tour of what you’re announcing today. Now, so let me ask one of the forbidden questions: What’s on the roadmap? What’s coming that customers can look forward to?


Nipun: So, one of the things which we are working on is that there has been a very good reception of the HeatWave capabilities we have introduced, so MySQL HeatWave is one of the fastest-growing services in the Oracle Cloud. But there has been a lot of interest in customers who have been asking us to provide similar capabilities on AWS. So, this is something which we are working on; it’s in the 
roadmap. And please stay tuned for more news on this.


Corey: You can bet that I will. I really want to thank you for taking the time out of your day to basically suffer my slings and arrows, and also spend time teaching what amounts to a remedial database course to a moron. But thank you once again for being as generous with your time as you always are.


Nipun: Well, thank you, Corey. It’s always a pleasure to come and talk to the show. Thank you again, for the opportunity.


Corey: Always. Nipun Agarwal, SVP at Oracle in charge of MySQL, YouSQL, and HeatWave. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and explain how databases always fail your personal benchmark of doing a SELECT on a terabyte of data at once.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


Announcer: This has been a HumblePod production. Stay humble.


Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.

Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.

Corey: Welcome to Screaming in the Cloud, I’m Corey Quinn. Today’s promoted episode is a returning guest with a slight difference. When last we spoke, Nipun Agarwal was a VP over at Oracle, but now—that’s right. When people stay in a company long enough and perform well, they wind up getting additional adjectives in lieu of other things—Nipun, you’re now a Senior VP over at Oracle. Congratulations, I think, unless that just means you’ve gotten older. Welcome back.

Nipun: Thank you, Corey.

Corey: So, now that you’re at SVP level, I can ask some of the harder questions that we didn’t necessarily—seem fair to get into the last time we spoke, such as what is an Oracle, and what might they do these days? For folks who have, I don’t know, been living in a cave for 40 years.

Nipun: Corey, glad to be back on your show. And since the last time we spoke, we have had, like, you know, a lot of enhancements and innovations, and I’ll be happy to describe those in detail whenever is a good time.

Corey: Absolutely so you’ve been focused on MySQL for a very long time. And you’ve been using it so long, I really should be calling it YourSQL, but that’s neither here nor there. And you’ve also been focusing on HeatWave, which is effectively MySQL with then some—I’m just going to cheat and call it magic that is layered on top of it. That is probably a terrible descriptor of what it actually does, but understand I’m coming from a perspective where I firmly believe the best database in the world is Amazon Route 53, which is a DNS server, so people look at that and say, ‘well, that’s not really what it’s designed to do,’ which really sounds like a ‘them’ problem. And fair enough. We’re going to invert it here. So, why is HeatWave a terrible DNS server? What is it exactly?

Nipun: So, MySQL is the most popular database in the world—it’s the most popular open-source database in the world—lots of people use it. All the major cloud vendors, they take the MySQL database, and either as is or, like, you know, with some enhancements, they offer a managed service, whether it’s Amazon, Azure, Google, pretty much all the major cloud vendors. Now, MySQL has been designed and optimized for transaction processing, so it does a great job for transaction processing. But when customers need to run complex queries or when they need to run analytics, customers would have to take the data out of the MySQL database into some other database for running analytics.

Corey: Let me make sure I understand your terms properly. When you say ‘transactional,’ you’re talking about I’m shopping for underpants on a website. I go ahead and make a purchase; that’s considered a transaction, and a database change reflecting my purchase makes sense. From an analytics perspective, you’re like, “All right, let’s see who bought underpants during this time period.” It’s effectively, usually, a small individual record versus now we’re going to start doing deep dives into effectively a lot of those records in aggregate, is that directionally correct, or is my understanding more than a little flawed about things beyond DNS?

Nipun: Right. What you describe is very accurate. That transaction processing is about point queries making frequent changes, whereas when we talk about analytics, it typically involves scanning a much larger amount of data to get the results, and aggregations is a very good example of that.

Corey: So historically, that seems that people have used very different tooling for different sides of those. Ideally—I admit, back in the bad old days when I was a systems administrator, we were running MySQL a fair bit, and we had the primary database, which was the thing that handled all of the live transactions and the rest, and whenever we ran business reporting queries on it, it’s like, “Huh, why is the website super slow?” And it didn’t seem to work very well. Now, back then, at the scale we were operating at the solution was, “Ah, we’re going to use a replica, and then we’re going to basically beat the crap out of the replica for our reporting queries.” And if that gets a little slow and bogged down, who cares? Well, just other people running reporting queries; people can still buy underpants.

So, that was the way that we handled it back then. This was a decade ago. Data sets have gotten significantly larger since then, and apparently, my way of viewing it is, as they say, quaint when they’re trying not to be actively insulting. The right way to do it these days is to have completely separate systems that wind up handling those queries with different user interfaces by and large. That is, to my understanding, the rise of ‘Big Data,’ and you can hear the initial caps in Big Data with people talk about it like that.

Nipun: Correct. So, what you describe is absolutely correct that people would extract the data out of databases, take it to specialized databases, which are [apt 00:05:11] for running decision-making analytic processing. But the downside is that a people need to express the logic and write code to extract this data, and then customers end up with these two different databases. They got to keep the data in sync, they got to move the data periodically. So, there are a lot of, like, you know, issues in terms of having to manage two different databases, one for transaction processing, one for analytics.

What we have done with HeatWave is to enhance the MySQL database service in the Oracle Cloud so that now the single MySQL database is optimized both for transaction processing as well as analytics. So, now you have a single database. And whether you want to run point queries or these aggregate queries, you can do it on the same data. So, the data remains as is. You’re bringing richness of computation, richness in query processing, to the customers.

Corey: One of the truisms of cloud is that it forces a reevaluation, in many cases, of things that people historically hadn’t had to think about it. A classic example when I was consulting on cloud migrations, was building up costing models, as you might imagine. And my customers would ask me questions, such is, “Great. So, what’s this going to cost us?” And I would come back with, “Well, okay, how many gigabytes in a given month does transfer between this database and that other database, you know, in the machine sitting right next to it?” And their response started off with a, “Why on earth do you think we would know that?” Followed by, “Wait, why do we need to know that?” Followed by, “Oh, God. It costs us to do what?”

And very quickly an architectural pattern has emerged within cloud of—you know, people experience this the second time, they plan for it. And as a result, whatever database is the most cost-effective is the one that data is already in because moving data from point to point is inherently an expensive proposition. Depending on where the second point is, it can be an extortionately expensive proposition. Which means that very often, we’ll start to see patterns that are, I guess, sacrificing one side of the database interaction model or the other, that transactions are going to be a little slower because you need to have it in the same place you’re going to be running large scale analytics on, or alternately, analytics are going to be super crappy, just because you have to wind up querying systems during downtimes and low periods. It just becomes a giant mess, regardless of whether it’s bad in one way, bad in another, or just expensive, it hasn’t worked for people. And my sense is that that is what HeatWave is directly aimed.

Nipun: Yes. Indeed. So, there are multiple reasons why HeatWave is being so successful. One is the case that okay, customers need a single database, instead of having multiple. The second thing is, there is absolutely no change required to MySQL applications, so the MySQL applications or MySQL compatible applications work as-is with this query [unintelligible 00:08:10] HeatWave without any change.

But the third reason why this is so popular is that HeatWave has been designed from the ground up for scalability, performance, and optimized for the underlying gear, which is the underlying cloud platform. As a result, it offers a very good price-performance compared to any of the service we have run against. So, not only is it providing the benefits of having a single database, no change to the application, but also it is extremely fast and low price. And that’s because a lot of technology innovations we did, like, almost like, over a decade to build this, scale our system for analytic processing, which has been optimized for the underlying cloud [commodity 00:08:55] gear.

Corey: So, help me understand. Is HeatWave a, effectively, reengineering of MySQL? Is it a completely separate layer that exists distinct from an existing MySQL database? Or is it something else entirely?

Nipun: So, we started off designing HeatWave separately as something ground up, which came out of many years of research and advanced developing. And once we knew that we could scale up HeatWave for analytic processing, and it is very well optimized for the underlying hardware and such. Then we did the work of enhancing the MySQL database so that it can be integrated, right? So yes, it started off as a standalone effort from the ground up so that we didn’t have to, you know, [live 00:09:38] any constraints of any existing codebase, so we could design it and optimize it right from the ground up to be the best possible. But then we integrated this thing with the MySQL database so that the customers can use it without requiring any change to the application in terms of the semantics or any new syntax, right? So, there’s absolutely no new syntax and no change to the semantics for existing MySQL applications. So, it gives you best of both worlds.

Corey: So, this has frequently been described in the context of a competitor to very—again, forgive the Amazonian focus; that’s where I spend most of my time, usually complaining about things—but it’s been positioned in some ways as a competitor to things such as RDS or Aurora, as well as Redshift, or Snowflake if we’re stepping slightly outside that ecosystem. The challenge that I keep running into, very often, is that when I talk to customers using those systems—and yes, those systems invariably show up on the bill as one of the big numbers, regardless of how you slice it—it feels like their use case for each of those is very different, it feels very much like half of those are aimed at purely transactional and half of them are aimed at the data warehousing story, the large amounts of data for analytics queries. And my default knee-jerk reaction, whenever someone says, “Ah, we built a thing that does both of those super well,” it’s, “Yeah, I’ve heard this before, it was the HP multifunction printer where it does three things, none of them well.” And no one has a multifunction printer that they liked for the longest time—because it’s moving parts and computers and the devil in equal measure—and it’s okay, so you’re trying to build something that stands between two worlds, but it’s easy to come away with the conclusion, as a result, that it’s not the best of breed for either use case, but rather a series of trade-offs or compromises that are made to enable both use cases. I get the sense that that is not your impression of what you’ve built.

Nipun: Correct. And I’ll give you a data point for that. In the data point is—

Corey: Yay. Data I love that. As opposed to your opinion is bad because my opinion is good. No, no, coming with data is a great approach. Please continue.

Nipun: [laugh]. In terms of the customers who are using or adopting MySQL HeatWave, one of the largest segments of the customers who are migrating their production workloads from other databases or other services and coming to HeatWave are AWS customers who are migrating their production workloads from RDS or Aurora and are going production with MySQL HeatWave. So, the fact that the customers are doing that is an evidence that there is some value to it. And the reasons they are doing it is absolutely no change to their application, it is faster, it is cheaper. Now, in addition, what they find is that many of these customers were moving their data from Aurora or RDS into Redshift or Snowflake for analytics. They don’t need to do that, right, and that’s an additional savings they get.

But we have a lot of evidence that existing customers have MySQL-based services—definitely AWS, but even on other clouds—and Aurora are migrating, and that’s very encouraging for us that, hey, we should be doing something right for customers to want to migrate their workloads to MySQL HeatWave.

Corey: You had a couple of announcements coming out about what’s new and what’s coming to HeatWave, and one of the ones that we’re talking about today is the idea of elasticity. Something you just said reminds me of a couple years ago when Amazon had relatively recently brought out Aurora and they said much the same thing of, “Oh, it’s super-elastic. You don’t have to take it down to make it bigger.” And it’s great. Well, you just talked about people removing data as they migrate somewhere else, and the question I had at the time was, “Okay, great. So, that’s how the database embiggens. That’s great. How does it emsmallen? Does that wind up having that same elastic property?”

And the response was very defensive, “Well, why would someone ever do that? Data only gets bigger.” And it’s, yeah, well, you haven’t worked with me in production where I accidentally drop a table now and again, and data does get smaller. And the answer for the longest time there was elasticity and auto-scaling was basically unidirectional because that’s what customers are asking for. Right. So, I have to ask, when you say elasticity around HeatWave, is that unidirectional, or does it mean that oh, now there’s less data, so we’re going to go back down again.

Nipun: It is bidirectional, so customers can upsize or they can downsize. Now, I have to say that HeatWave is a highly scalable system. And what that means is that as customers add more nodes to the cluster, the performance of the system improves almost linearly with the number of nodes which have been added. So, as a result, we have a lot of customers who start with a cluster size of certain number, and based on the workloads, they either add nodes or they reduce the number of nodes, right? So, it’s a very common operation; people want to scale up and scale down.

And with the real-time elasticity feature we have introduced, customers can do either operation and with absolutely no downtime. There’s absolutely no time when the cluster is not available for queries or for DMLs, right? So, while the resize operation is going on, this cluster is fully available and customers can upsize to a number of nodes and downsize to any number of nodes.

Corey: As it scales in or scales out, is that effectively doing its own internal sharding and rebalancing of data under the hood, invisible to customers? Is there something else going on? Like, how does this work?

Nipun: Right. So, take the example that customer has, say, four nodes and they want to add two more notes. There are couple of interesting properties over here. We have a technical super-partitioning, by which we know exactly which are the blocks of data which have to be populated to the new nodes which have been added. However, one of the key design points of our elasticity is that there is no data movement between the nodes.

So, all the data which has to be populated in the new nodes which are being added is fetched from the object store, the [OCI 00:15:50] object store. As a result, the existing cluster of four nodes is working as is, queries are working as is, without any degradation in performance. When the data has been populated to these additional nodes, the system then starts having the queries execute on the larger cluster. So, the smaller cluster is available all the time, then the larger clusters available, so from a user’s perspective, they see absolutely no downtime. And since there is no data movement happening from the initial four nodes, there is no degradation of the existing queries which will be running on the older cluster.

Corey: It’s 2022 and you’re announcing enhancements to a technology, so of course, it is a given that you are now talking as well about machine learning. Now, in a general sense, whenever someone says that my immediate instinctive reaction is to check my wallet in case someone is in the middle of picking my pocket because it seems like it winds up in some very weird places. What is machine learning and its applicability to HeatWave? Because generally speaking, when I look at things you can use machine learning for the answer is often finding signal from noise in large datasets and, of course, the ever-popular bias laundering. But I get the sense that neither one of those is quite what you’re talking about here. What monstrosity have you built?

Nipun: With MySQL HeatWave, customers are bringing in more data from either consolidating multiple MySQL databases into one, bringing workloads from other database into MySQL, but the volume of data which now customers are putting into MySQL HeatWave is growing because they want to run transaction processing, analytics all together in one database. Now, as the size of the data is growing, we are finding that many customers want to extract the data or currently need to extract the data out of the MySQL database to run machine-learning processing. So, some of the very large customers of MySQL HeatWave have been using HeatWave very successfully for transaction processing and analytics, but they had to extract the data out to some other ecosystem, to some other service for machine-learning processing. With the announcement we have made, which is HeatWave ML, we are now providing in-database support for machine learning, meaning that customers of MySQL HeatWave can do training, inference, as well as explanations, all inside MySQL HeatWave, without the data or the model ever having to leave MySQL.

And this is something which is fairly unique. Apart from the Oracle database, I’m not aware of any other database, which provides in-database machine-learning capabilities, and certainly not as rich, right, which is very efficient training, inference, and explanations. And all models which are created by HeatWave ML inside MySQL HeatWave can be explained, which is a pretty important capability which enterprise customers like to have.

Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They’ve also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That’s S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.

Corey: What does this wind up empowering customers to do? Give an example or two, just because it’s easy to talk about this stuff in the abstract as far as, “Oh, it would theoretically let someone do X, Y or Z.” But the problem I found, generally speaking, in the world of machine learning is that it is challenging to articulate it in a way that people hear the story and think, “Hey, that looks like something I might want to do.” As opposed to the common stories are, “Well, if you have a world-spanning data set and want to do this, this, and this”—like, “Well, I don’t. And I don’t and I don’t and I don’t, so what value is it to me?” What capabilities does it unlock?

Nipun: Right. So, with the introduction of HeatWave, what we had said is that customers don’t need multiple databases: One for transaction processing, one for analytics; they can do both transactional processing and analytics with one database, right? That’s what we started off with. Now, the same thing holds true for machine learning. Current customers of most databases need to extract data out of the database for doing machine learning.

And we are saying, “Hey, that’s not [unintelligible 00:19:45], analytics, mixed workloads, or machine learning. Your data can all be inside MySQL, MySQL HeatWave, and you can do all the processing with that service.” Now, the kinds of capabilities customers like to have for machine learning, training as the most important one. And training is a very time-consuming operation. And typically when customers do training and they’re using some other service, it’s time-consuming and it is very expensive as well.

One of the very interesting properties here is that when you’re running machine learning inside HeatWave, you don’t need to provision any additional cluster, or you don’t need to have any custom gear. This machine-learning training is happening on the same cluster which the user has provisioned for analytics or for transaction processing. So, on the same hardware, on the same cluster, now they can run machine-learning processing. So, the kind of use case which you’re asking is when customers have this data—and I’ll walk you through an example. Take the case of credit card, right?

If a bank wants to determine whether they want to, like, deny someone a credit card or approve it, it’s based on some characteristics. Many of the times, people use a rule-based mechanism, but now with data-driven approaches, people want to look at a lot of data and the system makes a recommendation that yes, this person is appropriate for, like, you know, granting the loan or not. And this is something for which customers—or, like, the enterprises want to have rich models which accurately provide a characterization of the data so that they can make the right predictions. So, training is very important because you want to get the training be done right on the data because it influences the quality of the predictions which are being made. And once a prediction is made, there may be reasons, like, there could be regulatory compliance reasons because of which the enterprise may need to offer an explanation that why was the credit card denied, just to kind of make sure that there wasn’t any bias or unfairness.

And that’s where machine-learning explanation capabilities are also very helpful. So, this is an example: when someone goes for apply for a credit card, whether it’s rejected or approved. Another example is that when someone is making a call, like a marketing team is making a call, and the system want to predict that will a call lead to a successful outcome or not. That’s another example. So, machine learning is being used very—now—extensively, and one of the advantages of a database is a database is where there’s a lot of data, so it’s a very, very good opportunity to harness this data using machine learning. Because machine learning is really tied to the richness of data and to the amount of data someone has.

Corey: That makes a lot of sense. So, it’s… it definitely shines a light at a, if not the easy answer for a lot of those questions, a directions that are people are going to have a better time of mapping to their specific use cases. One that I think is easier for everyone to map to a specific use case is another component of what you folks are announcing which is cost reduction, which is, to be direct, not something people generally think of Oracle as the first example of. A company that’s like, “Ah, that’s the thing that’s going to cost me less money.” And to be clear, I have no problem with that. I pride myself on absolutely not being the least expensive answer to basically anything. But it is an interesting direction to go in. There are a few ways you can wind up saving folks money. Which path have you folks taken?

Nipun: Now, there are multiple ways in which we can reduce the cost for the customer. So, one thing to realize it is MySQL customers are very cost-sensitive. And in the previous benchmarks and results we have shown, we have shown that, you know, compared to other vendors, we are significantly faster—that HeatWave significantly faster and significantly cheaper. So, we have class of customers come to us saying, “Hey, you know what? Can you trade-off some performance for even lower cost?”

And the way we have done is the following: We have doubled the amount of data which can be processed on a HeatWave node. So, HeatWave is an in-memory system, so the size of the cluster depends upon the amount of data which is being processed. And it depends upon the amount of data which can be processed per node. So, if you double the amount of data that can be processed per node, it means that now customers need a cluster half the size compared to what they were doing in the past, which reduces their cost by half. Now, please note, when they’re running on a cluster half the size, the amount of time it takes to run the same query will double.

So, what it means is, the system is providing the same price-performance because half the cost, double the time. But it’s a choice that customers have. If they still want to get the same performance [unintelligible 00:24:38] earlier, they can continue to run on the larger cluster, but now they have a choice. So, in a way, we are providing an even lower entry point for customers. That’s the first part of cost savings.

Corey: And that makes sense because with a lot of the workloads you see where it’s nice to be able to run analytics on the same type of data, you don’t need the same level of responsiveness on a lot of those queries either, where it’s, “So, we’re trying to get an answer to this giant analytics query.” “Okay, so great. How quickly do you need it working?” When transactions are measured in fractions of a second, the answer to analytics queries is, “Well, Tuesday would be nice. We’d like it by Tuesday if you can find a way to pull that off.”

So, there’s no reason to pay for near-line-rate speeds if you don’t need it for a lot of those queries, which is absolutely going to be an interesting option for folks. Now, you said there was a second aspect as well.

Nipun: Yes. And the second aspect is, again, for analytics, right? Customers want to run the queries, they want to run it occasionally, they don’t want to run it all the time, so what we are now introducing is a feature called ‘Pause and Resume.’ And what it does is that if you’re not using the cluster, you can pause and the system makes a copy of the data and all the metadata associated with the data in a backup, and when the user wants, they can resume and, like, you know, fetch the data, which is still in the in-memory presentation and all the metadata associated with Autopilot. And just resume, right? So, this is another way by which customers when they’re not using the cluster for some duration time, they can pause it, and for the duration they pause it, they’re not being charged.

Corey: I am a big believer of the number one step of cloud economics is like, “Oh, should I buy it some reservations or lock into long-term contract?” “No. You should turn things off when you’re not using them.” And people look at you strange, and say, “What? You can turn things off?” And yes, you absolutely can, which makes people feel better about generally not doing it.

But again, customer behaviors are usually ones that makes sense in their context. I just look at from a billing perspective, and it seems a little weird. I like the option, particularly for things that are either non-production or only going to be relevant to production during certain time windows, there are a number of areas where that begins to make an awful lot of sense, and people would do it if it didn’t require backing up the database, destroying the cluster, then re-provisioning the database restoring the cluster. And, yeah, people don’t generally have weeks to spend on spin-up and spin-down.

Nipun: Yes, in fact, that’s a very, very good observation, Corey. I want to say that many of our customers who are running their production workloads and HeatWave, they also have a test environment. And exactly on the lines of what you said, that they want to have a copy of the data in the test environment, should something bad happen, but they don’t want the cluster on all the time. They just wanted for some duration of time and for them, this pause and resume will be a very good idea. And also like, you know, save them money. So, something which we have seen with many of our customers.

Corey: The last component of your announcement is one that I approach with a significant amount of skepticism because every time I start drifting in this direction, one thing is for certain: It’s that I’m going to get yelled at on the internet. I’m referring, of course, to benchmarking. Now, Oracle historically has been a company that prefers people not benchmark and publish results of those benchmarks, backdating into the mists of history. And the argument has always been that people don’t generally tend to benchmark database workloads appropriately, due to a series of misunderstandings, and let’s be clear, this stuff is complicated. And a number of companies in the space love to talk about their benchmarks are great, and when you look into it, it’s okay, those numbers are great.

And you sort of know that the benchmarks that didn’t perform so well are not the ones that they’re talking about. And then their competitor immediately winds up chiming in, where it’s, “Ah, they’re doing it wrong because when you do these other benchmarks, our solution winds up being better.” And it winds up in a nerd slap-fight that no one, even the participants, particularly enjoy. What makes your benchmarks interesting is that you talk through not just what the benchmark results are—because, of course, that’s the entire point—you’re also putting the benchmark methodology and tooling up on GitHub where people can grab it and run it themselves, and see for yourself is the entire approach. That is—how do I put this politely—that is atypical of large companies in general and Oracle in particular. What changed?

Nipun: Right. So, there are three things over here, Corey, right? The first thing is, as we talked about, MySQL is the most popular open-source database in the world. Pretty much all cloud vendors, they have some version of MySQL which they’re offering as a managed service, and in many cases, they’re enhancing MySQL and then offering their service. So, in the context of MySQL, it becomes very important for us to give the opportunity to our customers, for them to compare which service is better for their needs.

So, is more important in the context of MySQL, since everyone is offering it and some of them have derivatives, that we provide some mechanism for people to compare. So, that’s the task for having a benchmark. That’s the first point. Second thing is when you want to compare the performance or the cost of these, like, you know, various flavors, instead of us coming with our own, say, workloads which you see from customers, it’s good to have a well-published benchmark, a well-understood benchmark, so that people can say, “Okay, you know what? Based on TPC-H, what is the performance?” Or, “On [TPC DS 00:29:54], what is the performance?”

In some cases, when a benchmark isn’t available, what we have done is for machine learning, we have used a bunch of open datasets and based on those open datasets, we are publishing the benchmarks to say, “Hey, we are so much faster or so much cheaper.”

And then the third aspect is in terms of why we are making them all available in GitHub or open-source. That these benchmarks are a starting point, but customers will have workloads which are different from these benchmarks, so we want to provide the opportunity for the customers to first look at what is our methodology, what have we used to come up with these numbers so they can reproduce them, but, B, if their workloads are different, they can enhance or augment these benchmarks in the way they would like, and then run them to see how they compare, right? So, we want to be fully transparent about what we have done, how we have done, and let customers decide on their own which is going to be the best platform from a cost perspective, from a performance perspective. So, this is the reason why we have chosen to benchmark and GitHub, like, make available all over scripts in the open-source.

Corey: One of the things I think I admire the most about that is I’ve always viewed benchmarks as being borderline worthless because I do not care in the slightest how your system performs on hand-selected ratings on sample data that you provide, whereas I care everything for how the system performs with my workloads and my data sets. So, unless I am talking to someone who is effectively a neutral third-party benchmark source, in which case they are immediately attacked for being shills for one company or another, and sometimes both or neither at the same time because people are terrible, but seeing how it runs on my workloads and with my constraints is the important and valuable thing. And this is the easiest I can ever see it being for getting a good representative feel for exactly how different offerings are going to perform under the specific conditions that my production environment lives within. Because it’s me we’re talking about the specific conditions of my production environment are, of course, terrifying.

Nipun: Right. So, I want to point out, yes, one is the fact that we have made these benchmarks methodology, like, you know, very transparent, but the second aspect of that is what we talked about last time, which is MySQL Autopilot, right? This is machine-learning-based automation, data-driven-based automation. So, we are very actively working on making it easy for customers to not have to do any configuration changes or optimizations; that the system determines, based on the queries, based on the workloads, how to best tune the system, right?

So, we are working in both angles: One is to make the system more intelligent, so that based on the workload, the system can optimize for the users workload, and then, B, making our approach very transparent so that customers can compare for themselves. So, we are very, very aware of this, and again, for MySQL customers, for many of these open-source customers, simplicity is very important and we are working hard to make it simpler and transparent to our users.

Corey: I really want to thank you for taking me on a tour of what you’re announcing today. Now, so let me ask one of the forbidden questions: What’s on the roadmap? What’s coming that customers can look forward to?

Nipun: So, one of the things which we are working on is that there has been a very good reception of the HeatWave capabilities we have introduced, so MySQL HeatWave is one of the fastest-growing services in the Oracle Cloud. But there has been a lot of interest in customers who have been asking us to provide similar capabilities on AWS. So, this is something which we are working on; it’s in the roadmap. And please stay tuned for more news on this.

Corey: You can bet that I will. I really want to thank you for taking the time out of your day to basically suffer my slings and arrows, and also spend time teaching what amounts to a remedial database course to a moron. But thank you once again for being as generous with your time as you always are.

Nipun: Well, thank you, Corey. It’s always a pleasure to come and talk to the show. Thank you again, for the opportunity.

Corey: Always. Nipun Agarwal, SVP at Oracle in charge of MySQL, YouSQL, and HeatWave. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and explain how databases always fail your personal benchmark of doing a SELECT on a terabyte of data at once.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.