Open Source, AI, and Business Insights with AB Periasamy

Episode Summary

Join Corey Quinn and MinIO's co-founder and CEO, AB Periasamy, for a look into MinIO's strategic approach to integrating open-source contributions with its business objectives amidst the AI evolution. They discuss the effect of AI on data management, highlight the critical role of data replication, and advocate for the adoption of cloud-native architecture. Their conversation examines the insights of data replication, mentioning its pivotal role in ensuring efficient data management and storage. Overall, a recurring theme throughout the episode is the importance of simplifying technology to catalyze a broader understanding and utilization that can remain accessible and beneficial to all.

Episode Video

Episode Show Notes & Transcript

Show Highlights:
(00:00) - Intro
(03:40) - MinIO's evolution and commitment to simplicity and scalability.
(07:25) - The significance of data replication and object storage's versatility.
(12:12) - Challenges and innovations in data backup and disaster recovery.
(15:21) - Launch of MinIO's Enterprise Object Store and its comprehensive features.
(20:50) - Balancing open-source contributions and commercial objectives.
(30:32) - AI's growing influence on data storage strategies and MinIO's role.
(34:33) - The shift towards software-defined data infrastructure driven by AI and cloud technologies.
(39:40) - Resources and the future of tech 
(43:31) - Closing thoughts 


About A.B Periasamy:

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India. He earned his BE in Computer Science and Engineering from Annamalai University.


Links:


Transcript

AB Periasamy: All you need is replication. And replication, copying the versions, the historic versions of the data, you actually solve the problem.

Corey Quinn: Welcome to Screaming in the Cloud. I'm Corey Quinn, back again after a little over a year to talk about what's new and exciting over in his world. We have A. B. of Parasami, who is the CEO and co founder of MinIO. Thank you for joining me again. How's your year been?

AB Periasamy: It's uh, it has been a wonderful year.

Last year was kind of, um, both ways, right? It saw the extreme. Customers had difficult time, uh, dealing with budgets. They even, Had to shrink existing deployments, right? One side, we saw extreme pressure on the budget. At the same time, within the same year, some of these customers actually expanded to a scale they never thought about, right?

That was because of AI. We actually saw both extremes, but net net. Towards the end of the year, it turned out to be our best year ever. I would say it was the worst and best year, but net net, we delivered.

Corey Quinn: This episode's brought to us in part by our friends at MinIO. With more than 1. 1 billion Docker pulls, most of which were not due to an unfortunate loop mistake, like the kind I like to make, and more than 37, 000 GitHub stars, which are admittedly harder to get wrong, MinIO has become the industry standard alternative to.

to S3. It runs everywhere, public clouds, private clouds, kubernetes distributions, bare metal, raspberries, pi, colocations, even in AWS local zones. The reason people like it comes down to its simplicity, scalability, enterprise features, and best in class throughput. Software defined, capable of running on almost any hardware you can imagine, and some you probably can't, MinIO can handle everything you can throw at it.

And AWS has imagined a lot of things from data lakes to databases. Don't take their word for it though. Check it out at www. min. io and see for yourself. That's www. min. io. The timing of this conversation is apt. From this recording, in about a week and a half, I'm giving a talk in Pasadena called Terrible Ideas in Kubernetes.

And the week after that, I'm giving a talk here in San Francisco at SREcon about the economics of on prem versus cloud. And this This does tie directly into what you're talking about, because for the last couple of months, I've built a Kubernetes of my own in this spare room, and installed MinIO as I was going down that particular path.

And that was quite the learning experience. The first time I did it, the entire cluster exploded, and more or less ate itself. And what I learned at the time was, okay, It had nothing to do with MinIO itself, and everything to do with the underlying storage subsystem, which did not work if there was a symlink involved in its path.

Instead of throwing an error, like a sensible system might, it just tore the cluster down around its own ears. So that was exciting and fun. The second time went much more smoothly once I got those problems taken care of, and now I have an object store in the spare room, just like, you know, mother intended.

So that was exciting. And there's also the economic story of talking about this at larger scale with companies that are doing this on more than a shoestring budget with a bunch of raspberries pie, like I am. So it's, it's been an interesting month just from my perspective, thinking about the confluence of those two things.

AB Periasamy: Yeah, actually that's pretty much the pattern across the industry, right? We, uh, when we started Minai, our idea was just this, at that time, it did not, uh, it did not make sense to a lot of people because we were saying when you, uh, uh, when you outgrow AWS you will come to Minai. It almost was laughable, right?

And, uh, if you see most of our large deployments, actually customers, uh, when they reached certain scale, um, cloud was. pretty expensive when it comes to data infrastructure. Data was not only like it never shrink data infrastructure. It actually accelerates new data comes in at even larger pace than all of the past data put together.

At that point, then they actually look at colo type infrastructure and what they experience with MinIO is the simplicity. We We made it ridiculously simple that it can even run on Raspberry Pi and laptops and home NAS systems, not because that was the intended use case. It has to be very simple if you want to build at exascale, and that simplicity naturally led me and Ivo to a larger install base.

But our focus has always been that Enterprise at large scale data is going to be heart of the business. And if you build something simple enough that combination of, let's say, Kubernetes for compute and MinIO for data infrastructure, that will give anybody a cloud infrastructure. The simple message was in the long run, we always knew the world will look like AWS or it is AWS.

And this is where cloud was built on open source and open source further accelerated and Kubernetes. Replacing Docker, Mesosphere, bunch of other technologies, even Cloud Foundry. When we started, there were many alternatives and it was chaotic, but it's the nature of open source. Eventually, when the dust settled, Kubernetes took the compute and object store, uh, took the data side and within object store, Minerva actually became the leading player.

And, uh, credit to its simplicity.

Corey Quinn: There's a use case for everything. Uh, for example, everything that I've installed will speak to arbitrary object stores. In fact, in many of the examples I find, Oh, here's how to use MinIO. And I do that for some things. And for others, it would be a wildly inappropriate fit.

For example, Oh, where do I want to back up the volume storage volumes too? Well, maybe not the cluster itself. That's like the snake eating its own tail. And you kind of want these things somewhere else. That you could use to rehydrate it. But then I started getting cost alerts because, huh, that that's starting to cost a fair bit of money for a bunch of metrics that you don't actually need.

And, okay, so now I wanna be judicious as far as what are the volumes, I really wanna make sure that I can recover not that many, but I also don't want to back it up to itself because if there's a small to mid-sized fire, given my propensity to be lousy with a soldering iron, which is always a possibility, I wanna have the ability to, to go forward.

But having something locally that isn't. I know, charging per API call or winding up metering what I do in strange ways in a test account is liberating in a very strange way. It means I don't have the same, I guess, economic sort of Damocles hanging over my head. But there are other problems with it. I have to think about disk capacity.

I have to think about nodes dropping out of a cluster. Whereas with cloud, I don't have to worry about those specific things. I have other problems that I get to worry about. So it's very much a trade off, and there's no one path that's going to work for absolutely everyone. But I like the ability to wind up smoothing over some of the key differences.

Because historically, if you wanted an object store, you were either going to be paying a cloud provider, or you were going to be buying something relatively janky and hoping it worked. Once you wind up, you know, making sure that your storage volumes don't crumble under the load of actually having data pass through them, MinIO is pretty great at that.

AB Periasamy: Yeah, and this is actually a much deeper topic, but I'll touch upon some of the problems that, uh, that the industry is facing today. The backup itself has many terms. In fact, if you even see the backup vendors who the traditional backup vendors who backed up. Uh, BMS and databases, they move, uh, moved on to calling copy data management and from copy data management to date cloud data management.

Now they're into cybersecurity and ai, right? So, uh, that, uh, that market is, is one side, like other side, we are finding the volume of data is so large. If you actually look at the data, the, the data lakes, the analytics data, the the data that's feeding ai. That data is many folds. It's not like just VM backups and, uh, and some small scale database, database backups.

How do you backup object store is one question. Then how do you backup into object store is another. Replication, right? Yeah, actually. Interestingly, that this is the, here's the thing that, uh, so why do people backup a very simple thing? I, if some, if, if any disaster strikes, I want to be able to go back in time.

So the real reason why they backup is I am. Intentionally and intentionally, I deleted, overwrote, something went wrong. I want to go back. And it's point in time recovery. And traditionally, industry did with snapshots. The SAN NAS did not give continuous data protection. You basically did snapshots and snapshot window.

Anything in between, you could lose the data. And the snapshots, because they read only clones of that data. particular point in time. You then take a copy of the data to a remote site, but it would work for small scale. When you have even 10 petabytes of data, the rate at which data changes in object store, the flowing in of data, you cannot possibly sync it up.

You cannot, when you take snapshots, it's not good enough. The, what customers want is to even file by file in object store, it's called object. Every change you can actually capture That's something that's unique about ObjectStore that SAN and NAS cannot do. When you have billions, sometimes hundreds of billions or trillions of objects, you can exactly point in time, this time exactly how this object look like, you can say that with ObjectStore because every mutation is an independent version.

So you got point in time recovery by default built into ObjectStore when you enable object level versioning. Now, I can go back in time, but what if my, uh, the new update I rolled out somehow corrupted stuff, or application accidentally deleted it, some kind of tampering happened, then all you need is replication, and replication, copying the versions, the historic versions of the data, you actually solve the problem of the problem.

very much. Uh, DR, DR, Point in Time Recovery with Active Active Synchronous Replication. And you can even go multi site replication. The enterprise still would feel comfortable if they are able to, uh, take a backup software and take a copy of the data and then make, make, put it in some third party system.

Unfortunately, those systems For small data set, if I was so critical, some parts of the bucket or some buckets smaller, it's small enough. You want to do it, you can do it. When you want to back up 100 petabyte volume, you also ask yourself that now if this is the primary store in the cloud, they already answered this, right?

The objects, there is only object store and there are different tiers of object store. Now, anything you want to back up object store into, that should not cost more than object store, right? At least equal, I can understand it can be cheap, it should be cheaper because you are not doing primary active IO on that system.

Today, any of the systems, whether it is tape or any, any, uh, I don't want to name these vendors, but any of these secondary data, data management systems, if you take, They are actually more expensive than object store itself. Industry is going through a chaotic phase because they are caught with the scale that they've never seen and the traditional backup software is falling apart.

Corey Quinn: It's one of those areas that I've found where there's a lot of nuance and a lot of variance. I'm building something else unrelated to most of the stuff I talk about here. Where, okay, I actually had to build a DR plan that would pass the sniff test. And the honest answer was, is that, okay, there's, there's not that much data we have that is not be able to be reconstructed relatively easily, but we're taking that and then we're snapshotting it over to an object store.

And then we're, we're replicating those, not just to another region, but to another account. And the rule is, if you have access to want to the production environment, you do not have access to, uh, write access to the, uh, failure, the failover story, an override or delete story. and vice versa, because it's, it's about what, at least for my use case, what is the, what if you wind up getting compromised and you and someone acting as you logs in?

I want that. I don't want to be able to have that be a blow out the company style of story, but I don't need that to be another provider at this point. If this ever were to grow to a point where you have to start explaining to auditors why you're not, then okay, switching that over is not the hardest problem to solve for.

But do you want to at least have a responsible, Oops, how do I revert that dumb thing that I just did by mistake? People care about that a lot the second time.

AB Periasamy: Definitely, right? And this has nothing to do with whether you do erasure code or replication, you have lots of parity, and you have very reliable storage system, NVMe drives, all that.

It does not matter, right? The application could, you rolled out an update to your application and it just started overwriting corrupted data. Many reasons this can happen. Now, the version, versioning of object store automatically gives this. For free, when every new change happens, you can actually go back in time, but that's not the problem.

What if someone intentionally went and cleaned up that bucket? Sometimes it could be a malware attack, right? So some kind of a third party system. Sorry, there was a data breach and they went and explicitly deleted old versions. This is why you have object locking. Object locking again is something that compared to SAN or NAS.

In SAN or NAS, you never had file level versioning. In object store, you have object level versioning and each and every version, every mutation that happened ever on your object store can actually be locked. It can be locked in two ways like compliant.

Corey Quinn: That can get expensive applied to the wrong things.

AB Periasamy: Yeah, the thing is, this gets, you get this for free with object store. And it's not just specific to MinIO, right? Amazon has it too. And these are actually vetted out by third party experts called like Cohasset Associates. They understand this deeply. They do the assessment on Amazon. They did for MinIO and they gave the assessment.

This has to be compliance grade. And there are modes where you can say, Okay, I want it to be default lock upon creation of the object or any mutation. Nobody can change this. And I would say up to six years. After that, you can do whatever you want with it. Or sometimes you can say that as an admin, I want to be able to unlock.

Now I want to be able to delete this because I know for sure this has been abandoned. This project is like decommissioned and I got approval from InfoSec team to actually delete the data, free up the space. Sure, admin can unlock it. There is even a mode where, uh, the, uh, the admin cannot unlock until that time, uh, comes and, uh, that level of protection you have, and, uh, this is about every single object that every single change you did, that not just you can go back in time, you can be guaranteed that nobody can tamper with any such changes until the measured time elapsed.

Corey Quinn: It's one of those areas where it's a, it just becomes a, a concern that if you're not in that use case market. You don't understand or really appreciate exists. Uh, all it takes when I think I've seen it all is to talk to one more customer and suddenly my perspective on things tends to be turned on its head.

Uh, so, relatedly, you have some news about a change coming out. Specifically, you're calling it the Enterprise Object Store. Which, silly me, I sort of assumed that was the paid offering of what you already had, but apparently I am misled on that. What are you releasing? What's new and exciting?

AB Periasamy: So it is, it is an upgrade, a significant upgrade, I would say, to the paid offering.

And what we saw with the enterprise was that towards, uh, towards last year, the last couple of years, we have been seeing the scale growing many folds. And last year, particularly, we noticed that customers are reaching exascale. When you run at small scale, all you need is ObjectStore and Kubernetes. But when it comes to scale, you look at it not as an object store.

You look at it as a data infrastructure. And when you have a data infrastructure, you, you don't need vo, it's not just about VO alone. You act, you need observability catalog, some global console that, that you can ma link multiple deployments, multiple tenants across multiple sites, sometimes even multiple clouds into one single console.

Uh, from that to, let's say, key management server. You turn on encryption, obviously, and then you need a key management server that can handle billions and billions of keys and also handle very high key creation and look up per second, right? So then I have all my data, but then how do I protect my data at the network level?

You don't want the API to ever hit. the server. Now you need a data firewall. All the firewalls out there are designed for application security. The closest thing that you have is web application firewall that understands HTTP traffic, but you still need to write specific rules and that requires deeper understanding of data traffic, primarily S3 API, right?

And then there are no Data Firewalls. So very soon we realized watching customers that it's not just object store, they ended up buying a collection of products around it to actually complete the object store. We always had a definition of min in MinIO means minimalism. And minimalism means the right quantity of something.

If it is less, it's incomplete. If it's excess, it's more. It doesn't qualify as minimalism. So I wanted to keep it light. And every feature, we actually even tell customers, if I add feature, if you want a feature required, if you ask us a new feature, you have to give me a hundred reasons why this is going to be useful for everyone.

And if you ask me to remove, I would gladly do it. But then when we looked at the object store, It was kind of incomplete when customers were deploying at scale because we were telling them to go bring these third party components. Without it, you wouldn't be able to take, you can take MinIO to production, but operating MinIO in production, you needed these capabilities.

And Only then it, you can call this as a complete object store. So I saw those capabilities were essential part of the data infrastructure infrastructure stack. So while we retain the capability for MinIO to talk to any key management server, any log monitoring system, any metrics monitoring system, any firewall out there, load balancer out there, uh, MinIO has the most widest support integration for all these third party services, but expecting customers to go do all the integration and all supporting those, uh, and integrating those also fell on our, our lab.

And that was the one time consuming for us. So we clearly saw that what if they were purposeful just for MinIO, And built into MinIO, you don't need to buy these custom batteries that were not designed for MinIO specifically. Uh, that's what makes this as enterprise object store. So the paying customers are getting all of these bundled in an upgrade that is, uh, the enterprise customers get all of these capabilities at no additional cost.

Corey Quinn: There's a lot to be said for the approach. I find that very often in the land of Kubernetes, it's a, you can choose anything to do all of these parts. And what that means is that there's no real golden path, like being told which one of these things should I use and being met with it. There are so many different options.

Great. I don't want to be set out to see on an ice floe by myself if I'm having trouble with this stuff down the road. I want to have a commonly deployed approach to it. And, like, that's one of the challenges you see with open ended systems. Like, looking at all of the things that Men. io can do, That's great.

That is fantastic for a number of use cases. I don't need 90 percent of it. Versioning. That's awesome. I don't actually need it for my use case of my test lab here. Oh, the ability to wind up doing, uh, balancing and, uh, at different tiers of storage. Great. These are raspberries pie. They are, there's not a whole lot of high grade hardware versus slow grade hardware in here.

It's, it's stuff that I do not need, but it's nice to know that it's there. That Let's be honest, a mouse click away, because I am a strong proponent of click ops whenever I can be. It really does a decent job of meeting people where they are. It is open source, and it has a remarkably high degree of polish for what I would consider to be a typical open source user experience.

Did you find when you were building this out that there was a bit of a balancing act, though, to get there? Because it's a, on the one hand, you don't want to give away the thing that actually drives your revenue for free to the entire world and just hope that they'll be, they'll do the right thing and pay you because at a certain point of scale, it's hard to get companies to be philanthropic, but you also don't want to make the free experience so crappy that no one adopts it unless they're paying you.

How do you strike the balance?

AB Periasamy: It sounds difficult, but it's actually not that hard, right? End of the day, I always look at everything as a matter of trust and love, right? Brand stands for trust and love. I personally, for me, I call it free software, not open source, right? But it's about software itself.

Software freedom, and it's a philosophy. Either you believe in it or you don't. And then, at the same time, Miniovo is also a business, like you said, right? It's not a non profit. I always believe that even non profit requires funding to operate. A product like Miniovo, it's not easy to pull off as a hobby grade project because if you are operating a giant data infrastructure, and these are mission critical financial information, You would not put it on a hobby grade product, right?

And the, but then how do you build that level of resiliency? It's not like non profits cannot do. I think you can clearly see from FSF to Apache Foundation to Linux Foundation, they've delivered incredible products. But the, for the time crunch, how fast we need to move. We as a company, I always found that it was much easier if you have unlimited time.

Sure. Right. Then you can organically with several mistakes here and there, you can get there. But I found that that getting funded and running as a company also gets you to the same path. Capitalist way of getting things done. Right. But If I believe in software freedom, then I can always be honest to the users, the community, and the customers.

That is how you strike the balance. You never take, even if it's proprietary software, if I just gave something away for free for a long time, and then suddenly one day I caught you, now you are stuck with me, now you have to pay me. If I force them, I would lose them. So the core principles are always the same.

Community are paying customers. You have a contract with them and it's built on trust. You never let them down. You can give more, but never take back what is given. And the community for us is the heart of it. Our paying customers and community, the difference is that community is not hackers and hobbies.

Our community Are our future customers. They are not playing with petabytes of data for because they have some hardware lying around and they want to have fun. These are serious users. And when they reach a point in time where it is a critical part of their business and they are in production. These two points when they click right, they match, they feel good that there is a commercial entity behind that when it reaches that maturity point, they have someone to knock the door and, uh, and become a paying customer.

Red Hat showed that they could build a lot. They could build a, at the time Red Hat got sold to, bought by IBM, they were the largest software exit ever. Red Hat showed that, It's not an open source or free software is not orthogonal, is not, it doesn't compete with the business, it's complimentary and the same model works for us too.

That we, as we make improvements, the enterprise, the paying customers, on the other hand, They have paid us and I have an obligation to give them more value. And if I, my end goal is to keep both of them happy. And the way to do it is be honest. And the community wants source code. They want all the latest and greatest.

And then there's a huge wide community. Enterprise customers, they wanted to, they wanted actually infrequent releases. They want, they don't want to see the latest source code. But at the same time, they Mission critical reliability and they want some neck to choke. And, uh, also they have, they were paying for all these clunky proprietary products around that they need to buy to operate MinIO.

So we were replacing all of that as well. So net net, it's the balance is built on trust. I think it

Corey Quinn: leads to a hard series of decisions, but you're right. The, the things that I use in my experimentation and other capacities tend to be what I'm most familiar with. What I recommend to people when they have questions is what has worked for me, the things I am inherently most familiar with.

And. If, okay, you, if you gate it behind a, you must first pay us at least this much money to use the product, I am extremely unlikely to have much firsthand experience with that in any of my fun experimentation style stuff. I'm not saying I'll never encounter it, but the odds of me getting to kick the tires on it are less.

And as a result, it feels like it does. It definitely highlights that there is value to the open source distribution mechanism. But then you have the other side of it, where companies seem to be taking the approach of, Okay, now we're going to make sure that no one else can implement it themselves with source available licenses.

Which, especially in your case, seems a little on the silly side. I, I don't think that AWS is going to try to use MinIO code to implement an object store. They've already got one. It works in its own specific ways and they are not grabbing stuff off the shelf to build S3. But that's just the nature of that particular beast.

So it's, it's a delicate balancing act, but I think you've done a great job of mostly striking the right notes.

AB Periasamy: Yeah, the thing is, right, like the SSPL, Common Clause are a bunch of licenses. They are not open source licenses, right? And they are not free software licenses either. I might, if I choose that, 100 percent proprietary.

I will never mislead our community saying we are open source and, uh, but when it comes to actually trying it out, oh, wait a minute, you have these restrictions. Those are proprietary licenses. There won't be any confusion. No one will be mad at us if you say, MinIO is proprietary, right? But if I, if I cause this ambiguity that MinIO is open source, but they look into the details of what license I have, and that is actually a proprietary license, that's where the friction is.

And also, that if you start as open source, and you then one day suddenly change the license and go proprietary, that's where the friction is. That also, you're taking back something you promised them, right? They put their trust on you and you misled them. You let them down. That also causes friction. So you, you not, you'd never take back what you gave them.

And that is where the free software and open source approved licenses guarantee that, and maybe in the future. that say there are companies that adopt these open source and free software licenses, they can add new capabilities, not give back, or in the future, they may even make it completely proprietary, but the community has the ability to fork and keep maintaining it, right?

The thing is that you never take back something that you granted. If you change the license like that and make it proprietary, that's point of friction. And also the point of other point of friction is ambiguity, that you, you call a proprietary licenses. Open Source License. I actually find that, that Apache license and AGPL license, they all have their own, uh, their own strengths and weaknesses.

Uh, AGPL license for the server and the SDKs and everything, Apache license that works out great. And for the paying customers, where that some of these enhanced capabilities are under proprietary license, and I have no problem saying that it is proprietary. Community never gets upset about that. Other factor that also the industry tend to play with is the what you give to the community.

They want community as a, uh, they come in, it's not like baiting them, right? And, uh, you would classify these as bait wares. If you gave community Something just enough that they came in and you got them hooked. Now, essential features like even Active Directory integration. It's very easy for them to just go a, a, a, like one engineer or, or two, three weeks can actually pull off an Active Directory integration.

In the past, it was kind of hard to do because no one had these. kind of code in the open source product, right? And MinIO and now other other projects also have similar integration. You can understand how this works and contribute that capability. Now, what should I do? I should take those features, right?

In MinIO's case, we only implemented all of these nice enterprise capabilities. The ones that you were talking about that even, uh, from multi site active active replication is not even there on AWS, like instant active active synchronous replication. To high performance erasure, go to all the enterprise grade encryption capabilities, even the object locking that we were talking about, everything is available to the community.

And when they go to production, they need, they were anyway buying these bunch of proprietary software to complete the infrastructure stack. And they, now I'm giving them so much more value replacing those proprietary components with the enterprise object store. So there was a nice balance. If I, if I treated my community, uh, as if the, I just want to bait them, trick them into using MinIO, I would lose them.

I'm better off. Going completely proprietary, educating the industry. Hey, this is what I am. I never, if I never mislead them, there are proprietary software vendors doing just fine in the market. They never misled the community, right? So it's the same thing here. And I take these, some of these, uh, uh, inspiration from the consumer products.

Like if you look at Whether it's YouTube or many of these products, they entirely disrupted the media industry, the cable industry. And if they treated the user base as a way to just bait them, like the model as a bait, nobody would be using YouTube like systems, right? They understood the delicate balance.

They are an essential part of your system and what you give and what you expect in return, as long as you are on the giving side. It's all good to go.

Corey Quinn: One last topic I want to get into is, it seems like I can't have a podcast episode these days without touching on the, on the zeitgeist here, but AI is absolutely.

Something that is sucking up an awful lot of hype energy. And there's value here. It's not like crypto. This is, it is clearly something that is useful. But whether you're doing your own model training, or whether you're enriching what you have with existing access to data, it's hard to disagree that this is a scale problem.

And that access to a whole bunch of data is necessary in almost every case. And I'm seeing that increasingly be object store. What are you seeing as far as what your customers are doing?

AB Periasamy: Yeah, so it's now trendy for everybody to simply put, get a dot AI domain name, right? And everything has become AI. But in our case, it's more true because there is a direct impact on our revenue, right?

The reason why some of these GPU vendors, their valuation is skyrocketing is because they are not able to make enough GPUs that customers can buy, right? And as long as that, uh, that, uh, that Supply problem is there, the demand is there. They will be valued like that. And, uh, the first wave hit the GPU, the hardware vendors who are directly serving the AI customers.

Customers couldn't buy enough. The second wave, it's already hitting us. That's the data side. Customers are now realizing that now I have the GPU fabric, but then I need to, I need to build a data fabric. In the end, they actually look at data is their asset. Hardware, even GPUs, every one of them, they are fully aware.

In just a matter of two, three years, GPUs will become a commodity. Intel and AMD, all of them coming into the race, and even ASIC players, there will be multiple GPUs. Every cloud vendor will make their own cheaper GPUs. GPUs are bound to become a commodity. Customers are now understanding the value of the data has grown multiple folds.

Previously with just the big data and analytics machine, traditional data science and machine learning alone showed the value of data is many folds. Whatever you spend on infrastructure is minuscule, right? And that has gone to a whole new level because we finally could understand Real unstructured data, not the big data, semi structured data.

This one is complex text, human language, source code, anything that is quite long. Previously, machines had no idea. This was left to creative jobs. Only humans could do it. This has unfolded a whole new level of value. And for the first time, we are seeing enterprises have more than just snapshots and VM images.

This time around, we are seeing They are having an audio video to all kinds of structured data documents that they would not previously store for a long time. Now it's everything is value. Every drive thru audio clip, every conversation, every zoom meeting, every document you can have, they are now capturing because you can easily make an LLM within Within hours, it can actually read billions of documents and become an expert on everything.

It's like data in Star Trek can read a book in a second. It's that experience enterprise is having. A direct result of that is customers now see that data is the core asset of their business. And the scale, so why, why is, why we are finding it exciting is It has impacted our revenue already, and we are now talking about 10 million plus dollar deals.

And these are exascale deployments, and exascale was only in the realm of national labs because they could afford, they had the talent to run that kind of infrastructure. Now it's hitting enterprise, and for us, we are now seeing that this is not an anomaly. We are seeing a repeat pattern here. An object store is well suited to take on that market.

This is where SAN, NAS, the days of SAN, NAS and the traditional enterprise store, they are on their way out.

Corey Quinn: Yeah, it's a, it's an interesting world. You're still going to have the underlying idea of some volume these things live on. But increasingly what I'm noticing is that instead of getting the super expensive SAN with the head ends and then the drive shelves, people are using commodity hardware and using distributed file systems, SAF, or in my particular case, Longhorn.

But there are ways to wind up having those volumes shared like or open EBS or a bunch of different things in that direction, which, okay, well, is that going to be reliable enough? Well, it depends if you can constrain blast radius and then layer something like MinIO on top of it that understands erasure coding or the fact you don't want every constituent volume living on the same virtual machine, you can start getting an awful lot of flexibility without.

Massively spiking your price on top of it. And let me be very clear, you're talking about 10 million deals paired to what it costs for SAM licensing at exabyte scale. There's no contest.

AB Periasamy: There is absolutely right. In fact, when I was building cluster, a distributed file system pre my previous, uh, startup, right?

Uh, and, uh, customers were, uh, customers got the idea, even investors would say, the investors would actually argue that, oh, but that's all, uh, only Yahoo and Google type, uh, applications. Even. In fact, Facebook was also part of our community board. Uh, they were all using, uh, glut Gluster, Fs. But it was dismissed.

Corey Quinn: Sure, there will never be other large scale companies like that again.

AB Periasamy: Yeah, right. Yeah, right. And that was dismissed. The software defined data store, right, was dismissed as those are exceptions only. And they would even call these Yahoo and Facebook type deployments as that's not enterprise. And you ask these Facebook engineers, they will tell you we are not enterprise, nobody else is enterprise.

You have no idea what like the level of security and scale that we are dealing with. And they understood very early on that modern infrastructure is software and you need software engineers to build and operate the infrastructure, not hardware and IT people. They understood that. And what accelerated that was The same application vendors, Amazon did this for themselves, but Amazon also saw the value in giving it to others.

Cloud was born, and these were all built on open source and homegrown, the software engineers were building this infrastructure. Imagine these vendors, if AWS S3 or EBS or anything is built on top of SAN or NAS, There won't be cloud today, right? That's why cloud fundamentally differentiated itself from the traditional hosting providers like Rackspace and everybody else.

Even Rackspace tried to build their own system, but the application vendors, the SaaS providers, they had the scale. They understood the importance of it. They started hiring engineers to write their own database, their own object store, everything. Every one of our large scale customers tried writing their own object store.

In the end, they saw that. They are the kind of team they can partner with and save. There's also some business risk, but nevertheless, right? Software defined data store. I, I always found the term, the term software defined, uh, object store was itself funny because for me, it was so obvious to our engineers, the customers who were in the cloud native space, it was so obvious.

You don't talk about software defined MongoDB, software defined Elasticsearch, software defined database, right? Databases are far more complicated product than object store. They run as just piece of software and commodity hardware. And why would you, why would you call object store as software defined?

It is because that is the only way we could explain to the traditional enterprise IT buyers who are used to buying storage in the appliance form factor. They used to even buy compute, like Spark, Sun, Solaris, right? They used to buy even compute as appliance and they are now getting to that idea. The cloud.

They understood the need and scale and it was all software defined from the beginning. They never talked about these terms, they didn't care about it. And now finally what we are seeing is that cloud and open source completely rewrote the enterprise landscape. All the new emerging deployments are going containers and software defined and on commodity hardware.

What is accelerating this adoption is basically AI workloads. Both cloud and AI is pushing the software defined trend. They don't even call it software defined, they call it cloud native, if at all. And the appliance based business models are on their way out.

Corey Quinn: I think that is probably right. It's a, People are expecting a different substrate that they can build on top of with modern software engineering.

And don't get me wrong, there's a very long tail of enterprises that exist that are still skeptical of virtualization, let alone cloud. But that's drying up. It's a, at this point, if you haven't migrated to cloud or something cloud y at an enterprise, there's usually a reason for it. Other than, wait, what's this cloud thing I keep hearing about?

That worked in 2014, it doesn't work in 2024. So I'm firmly convinced that we're seeing a seismic shift here. I'm excited to see what this empowers people to build going forward. When suddenly having deeply reliable storage is no longer something that either is in the cloud or is something that you have to build yourself from scratch and bailing wire.

This is, this is going to turn into something interesting and I'm curious to see where it goes. If people want to learn more, where's the best place for them to go to find you?

AB Periasamy: Obviously the community and we are always open to having any of these engagements, even if it's the, it's not the community, just write, write to us.

There are multiple ways to reach out to us. We are here to educate, right? And this is where even we do not look at marketing as fluffy advertisements, commercials, right? It's about educating the industry. We want to educate them the right way. Modern buyers today are sophisticated. They know what they are, what they want and they can differentiate the right kind of information and this and trying to be salesy and trying to get their money right.

I think if we, if you are altruistic here, educate, if you educated them the right, gave them the right information, you build a long term trust with them. That's what businesses are about, that they become a customer not because one time they want to just buy and walk away, at least not in the data infrastructure business.

They want to build a long term relationship with us, and it starts with having these kind of conversations and educating them, even if you have no immediate need to buy or anything, right? You have, you're not planning to engage with us commercially, or you might actually end up with our competition.

That's quite all right. Any opportunity for us to educate the industry, honestly, right? And it's going to help them, help us. We are always open. And the part about how the industry is now shifting towards software defined, right? It's also the responsibilities on us. While the customers have the need. to shift to modern software defined cloud native type infrastructure.

The, they also understand that if they don't do this, they will become, their business will become irrelevant. The change that AI is bringing is happening so fast. In next three to five years, the entire industry landscape can look different. And if they don't adopt it, they're going to be in trouble. But at the same time that The cultural shift has to happen within the organization.

This is where the public cloud, despite being expensive, played an enormous role in educating the industry. This is how your infrastructure should be. And it prepared them already. The last five years has been so crucial. It prepared them how you look at modern infrastructure run by software engineering team.

And that change came at the right time. And then now the AI is starting this journey. At the same time, vendors like us, right, We have a responsibility. If I told them that you replace your appliance with software and if software is not giving them the same benefit or better, they would not shift. It goes both ways.

They want to change. I think customers are in the mood to change. They understand the urge to change. If we make it harder for them to change, this change won't happen. This will result in a bubble burst, right? Like we saw this happen in the past. Anytime new big technologies come, there is a gold rush, and then eventually leads to a collapse.

AI is real, but I think if we don't get this right, the wheels will come off. This is an irreversible trend. It will happen. But I think the trend around that if Everything is software, and if customers are unprepared for it, and if the infrastructure is complicated, they just wouldn't adopt it. There will be few giant winners consolidating the market, which is bad for the whole industry.

The, this is where the Kubernetes, even the open source community, and all these projects have an important role. Work towards simplicity. Someone with average technician grade skill can adopt and operate an infrastructure. If you don't do it. It won't, it won't transition.

Corey Quinn: Yeah. And I think that's a, that's a good perspective to have on it.

I really want to thank you for taking the time to speak with me today. I really do appreciate it. And I look forward to hearing next year, what you come up with by then.

AB Periasamy: Yep. I enjoy these conversations and it's great to be here. Thank you for having me. Of course.

Corey Quinn: Uh, A. B. Apiriyasamy is the co founder and CEO of MinIO.

This promoted guest episodes have also been brought to us by. MinIO, and I'm cloud economist, Corey Quinn. If you've enjoyed this podcast, please leave a five star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five star review on your podcast platform of choice, along with an angry, insulting comment that one day will get expunged because that platform isn't using object store to store the comments.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.