Episode Show Notes & Transcript
AB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).
AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world.
AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.
AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.
- MinIO: https://min.io/
- Twitter: https://twitter.com/abperiasamy
- LinkedIn: https://www.linkedin.com/in/abperiasamy/
- Email: mailto:[email protected]
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: This episode is sponsored in part by our friends at Chronosphere. When it costs more money and time to observe your environment than it does to build it, there’s a problem. With Chronosphere, you can shape and transform observability data based on need, context and utility. Learn how to only store the useful data you need to see in order to reduce costs and improve performance at chronosphere.io/corey-quinn. That’s chronosphere.io/corey-quinn. And my thanks to them for sponsor ing my ridiculous nonsense.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn, and I have taken a somewhat strong stance over the years on the relative merits of multi-cloud, and when it makes sense and when it doesn’t. And it’s time for me to start modifying some of those. To have that conversation and several others as well, with me today on this promoted guest episode is AB Periasamy, CEO and co-founder of MinIO. AB, it’s great to have you back.
AB: Yes, it’s wonderful to be here again, Corey.
Corey: So, one thing that I want to start with is defining terms. Because when we talk about multi-cloud, there are—to my mind at least—smart ways to do it and ways that are frankly ignorant. The thing that I’ve never quite seen is, it’s greenfield, day one. Time to build something. Let’s make sure we can build and deploy it to every cloud provider we might ever want to use.
And that is usually not the right path. Whereas different workloads in different providers, that starts to make a lot more sense. When you do mergers and acquisitions, as big companies tend to do in lieu of doing anything interesting, it seems like they find it oh, we’re suddenly in multiple cloud providers, should we move this acquisition to a new cloud? No. No, you should not.
One of the challenges, of course, is that there’s a lot of differentiation between the baseline offerings that cloud providers have. MinIO is interesting in that it starts and stops with an object store that is mostly S3 API compatible. Have I nailed the basic premise of what it is you folks do?
AB: Yeah, it’s basically an object store. Amazon S3 versus us, it’s actually—that’s the comparable, right? Amazon S3 is a hosted cloud storage as a service, but underneath the underlying technology is called object-store. MinIO is a software and it’s also open-source and it’s the software that you can deploy on the cloud, deploy on the edge, deploy anywhere, and both Amazon S3 and MinIO are exactly S3 API compatible. It’s a drop-in replacement. You can write applications on MinIO and take it to AWS S3, and do the reverse. Amazon made S3 API a standard inside AWS, we made S3 API standard across the whole cloud, all the cloud edge, everywhere, rest of the world.
Corey: I want to clarify two points because otherwise I know I’m going to get nibbled to death by ducks on the internet. When you say open-source, it is actually open-source; you’re AGPL, not source available, or, “We’ve decided now we’re going to change our model for licensing because oh, some people are using this without paying us money,” as so many companies seem to fall into that trap. You are actually open-source and no one reasonable is going to be able to disagree with that definition.
The other pedantic part of it is when something says that it’s S3 compatible on an API basis, like, the question is always does that include the weird bugs that we wish it wouldn’t have, or some of the more esoteric stuff that seems to be a constant source of innovation? To be clear, I don’t think that you need to be particularly compatible with those very corner and vertex cases. For me, it’s always been the basic CRUD operations: can you store an object? Can you give it back to me? Can you delete the thing? And maybe an update, although generally object stores tend to be atomic. How far do you go down that path of being, I guess, a faithful implementation of what the S3 API does, and at which point you decide that something is just, honestly, lunacy and you feel no need to wind up supporting that?
AB: Yeah, the unfortunate part of it is we have to be very, very deep. It only takes one API to break. And it’s not even, like, one API we did not implement; one API under a particular circumstance, right? Like even if you see, like, AWS SDK is, right, Java SDK, different versions of Java SDK will interpret the same API differently. And AWS S3 is an API, it’s not a standard.
And Amazon has published the REST specifications, API specs, but they are more like religious text. You can interpret it in many ways. Amazon’s own SDK has interpreted, like, this in several ways, right? The only way to get it right is, like, you have to have a massive ecosystem around your application. And if one thing breaks—today, if I commit a code and it introduced a regression, I will immediately hear from a whole bunch of community what I broke.
There’s no certification process here. There is no industry consortium to control the standard, but then there is an accepted standard. Like, if the application works, they need works. And one way to get it right is, like, Amazon SDKs, all of those language SDKs, to be cleaner, simpler, but applications can even use MinIO SDK to talk to Amazon and Amazon SDK to talk to MinIO. Now, there is a clear, cooperative model.
And I actually have tremendous respect for Amazon engineers. They have only been kind and meaningful, like, reasonable partnership. Like, if our community reports a bug that Amazon rolled out a new update in one of the region and the S3 API broke, they will actually go fix it. They will never argue, “Why are you using MinIO SDK?” Their engineers, they do everything by reason. That’s the reason why they gained credibility.
Corey: I think, on some level, that we can trust that the API is not going to meaningfully shift, just because so much has been built on top of it over the last 15, almost 16 years now that even slight changes require massive coordination. I remember there was a little bit of a kerfuffle when they announced that they were going to be disabling the BitTorrent endpoint in S3 and it was no longer going to be supported in new regions, and eventually they were turning it off. There were still people pushing back on that. I’m still annoyed by some of the documentation around the API that says that it may not return a legitimate error code when it errors with certain XML interpretations. It’s… it’s kind of become very much its own thing.
AB: [unintelligible 00:06:22] a problem, like, we have seen, like, even stupid errors similar to that, right? Like, HTTP headers are supposed to be case insensitive, but then there are some language SDKs will send us in certain type of casing and they expect the case to be—the response to be same way. And that’s not HTTP standard. If we have to accept that bug and respond in the same way, then we are asking a whole bunch of community to go fix that application. And Amazon’s problem are our problems too. We have to carry that baggage.
But some places where we actually take a hard stance is, like, Amazon introduced that initially, the bucket policies, like access control list, then finally came IAM, then we actually, for us, like, the best way to teach the community is make best practices the standard. The only way to do it. We have been, like, educating them that we actually implemented ACLs, but we removed it. So, the customers will no longer use it. The scale at which we are growing, if I keep it, then I can never force them to remove.
So, we have been pedantic about, like, how, like, certain things that if it’s a good advice, force them to do it. That approach has paid off, but the problem is still quite real. Amazon also admits that S3 API is no longer simple, but at least it’s not like POSIX, right? POSIX is a rich set of API, but doesn’t do useful things that we need to do. So, Amazon’s APIs are built on top of simple primitive foundations that got the storage architecture correct, and then doing sophisticated functionalities on top of the simple primitives, these atomic RESTful APIs, you can finally do it right and you can take it to great lengths and still not break the storage system.
So, I’m not so concerned. I think it’s time for both of us to slow down and then make sure that the ease of operation and adoption is the goal, then trying to create an API Bible.
Corey: Well, one differentiation that you have that frankly I wish S3 would wind up implementing is this idea of bucket quotas. I would give a lot in certain circumstances to be able to say that this S3 bucket should be able to hold five gigabytes of storage and no more. Like, you could fix a lot of free tier problems, for example, by doing something like that. But there’s also the problem that you’ll see in data centers where, okay, we’ve now filled up whatever storage system we’re using. We need to either expand it at significant cost and it’s going to take a while or it’s time to go and maybe delete some of the stuff we don’t necessarily need to keep in perpetuity.
There is no moment of reckoning in traditional S3 in that sense because, oh, you can just always add one more gigabyte at 2.3 or however many cents it happens to be, and you wind up with an unbounded growth problem that you’re never really forced to wrestle with. Because it’s infinite storage. They can add drives faster than you can fill them in most cases. So, it’s it just feels like there’s an economic story, if nothing else, just from a governance control and make sure this doesn’t run away from me, and alert me before we get into the multi-petabyte style of storage for my Hello World WordPress website.
AB: Mm-hm. Yeah, so I always thought that Amazon did not do this—it’s not just Amazon, the cloud players, right—they did not do this because they want—is good for their business; they want all the customers’ data, like unrestricted growth of data. Certainly it is beneficial for their business, but there is an operational challenge. When you set quota—this is why we grudgingly introduced this feature. We did not have quotas and we didn’t want to because Amazon S3 API doesn’t talk about quota, but the enterprise community wanted this so badly.
And eventually we [unintelligible 00:09:54] it and we gave. But there is one issue to be aware of, right? The problem with quota is that you as an object storage administrator, you set a quota, let’s say this bucket, this application, I don’t see more than 20TB; I’m going to set 100TB quota. And then you forget it. And then you think in six months, they will reach 20TB. The reality is, in six months they reach 100TB.
And then when nobody expected—everybody has forgotten that there was a code a certain place—suddenly application start failing. And when it fails, it doesn’t—even though the S3 API responds back saying that insufficient space, but then the application doesn’t really pass that error all the way up. When applications fail, they fail in unpredictable ways. By the time the application developer realizes that it’s actually object storage ran out of space, the lost time and it’s a downtime. So, as long as they have proper observability—because I mean, I’ve will also asked observability, that it can alert you that you are only going to run out of space soon. If you have those system in place, then go for quota. If not, I would agree with the S3 API standard that is not about cost. It’s about operational, unexpected accidents.
Corey: Yeah, on some level, we wound up having to deal with the exact same problem with disk volumes, where my default for most things was, at 70%, I want to start getting pings on it and at 90%, I want to be woken up for it. So, for small volumes, you wind up with a runaway log or whatnot, you have a chance to catch it and whatnot, and for the giant multi-petabyte things, okay, well, why would you alert at 70% on that? Well, because procurement takes a while when we’re talking about buying that much disk for that much money. It was a roughly good baseline for these things. The problem, of course, is when you have none of that, and well it got full so oops-a-doozy.
On some level, I wonder if there’s a story around soft quotas that just scream at you, but let you keep adding to it. But that turns into implementation details, and you can build something like that on top of any existing object store if you don’t need the hard limit aspect.
AB: Actually, that is the right way to do. That’s what I would recommend customers to do. Even though there is hard quota, I will tell, don’t use it, but use soft quota. And the soft quota, instead of even soft quota, you monitor them. On the cloud, at least you have some kind of restriction that the more you use, the more you pay; eventually the month end bills, it shows up.
On MinIO, when it’s deployed on these large data centers, that it’s unrestricted access, quickly you can use a lot of space, no one knows what data to delete, and no one will tell you what data to delete. The way to do this is there has to be some kind of accountability.j, the way to do it is—actually [unintelligible 00:12:27] have some chargeback mechanism based on the bucket growth. And the business units have to pay for it, right? That IT doesn’t run for free, right? IT has to have a budget and it has to be sponsored by the applications team.
And you measure, instead of setting a hard limit, you actually charge them that based on the usage of your bucket, you’re going to pay for it. And this is a observability problem. And you can call it soft quotas, but it hasn’t been to trigger an alert in observability. It’s observability problem. But it actually is interesting to hear that as soft quotas, which makes a lot of sense.
Corey: It’s one of those problems that I think people only figure out after they’ve experienced it once. And then they look like wizards from the future who, “Oh, yeah, you’re going to run into a quota storage problem.” Yeah, we all find that out because the first time we smack into something and live to regret it. Now, we can talk a lot about the nuances and implementation and low level detail of this stuff, but let’s zoom out of it. What are you folks up to these days? What is the bigger picture that you’re seeing of object storage and the ecosystem?
AB: Yeah. So, when we started, right, our idea was that world is going to produce incredible amount of data. In ten years from now, we are going to drown in data. We’ve been saying that today and it will be true. Every year, you say ten years from now and it will still be valid, right?
That was the reason for us to play this game. And we saw that every one of these cloud players were incompatible with each other. It’s like early Unix days, right? Like a bunch of operating systems, everything was incompatible and applications were beginning to adopt this new standard, but they were stuck. And then the cloud storage players, whatever they had, like, GCS can only run inside Google Cloud, S3 can only run inside AWS, and the cloud player’s game was bring all the world’s data into the cloud.
And that actually requires enormous amount of bandwidth. And moving data into the cloud at that scale, if you look at the amount of data the world is producing, if the data is produced inside the cloud, it’s a different game, but the data is produced everywhere else. MinIO’s idea was that instead of introducing yet another API standard, Amazon got the architecture right and that’s the right way to build large-scale infrastructure. If we stick to Amazon S3 API instead of introducing it another standard, [unintelligible 00:14:40] API, and then go after the world’s data. When we started in 2014 November—it’s really 2015, we started, it was laughable. People thought that there won’t be a need for MinIO because the whole world will basically go to AWS S3 and they will be the world’s data store. Amazon is capable of doing that; the race is not over, right?
Corey: And it still couldn’t be done now. The thing is that they would need to fundamentally rethink their, frankly, you serious data egress charges. The problem is not that it’s expensive to store data in AWS; it’s that it’s expensive to store data and then move it anywhere else for analysis or use on something else. So, there are entire classes of workload that people should not consider the big three cloud providers as the place where that data should live because you’re never getting it back.
AB: Spot on, right? Even if network is free, right, Amazon makes, like, okay, zero egress-ingress charge, the data we’re talking about, like, most of MinIO deployments, they start at petabytes. Like, one to ten petabyte, feels like 100 terabyte. For even if network is free, try moving a ten-petabyte infrastructure into the cloud. How are you going to move it?
Even with FedEx and UPS giving you a lot of bandwidth in their trucks, it is not possible, right? I think the data will continue to be produced everywhere else. So, our bet was there we will be [unintelligible 00:15:56]—instead of you moving the data, you can run MinIO where there is data, and then the whole world will look like AWS’s S3 compatible object store. We took a very different path. But now, when I say the same story that when what we started with day one, it is no longer laughable, right?
People believe that yes, MinIO is there because our market footprint is now larger than Amazon S3. And as it goes to production, customers are now realizing it’s basically growing inside a shadow IT and eventually businesses realize the bulk of their business-critical data is sitting on MinIO and that’s how it’s surfacing up. So now, what we are seeing, this year particularly, all of these customers are hugely concerned about cost optimization. And as part of the journey, there is also multi-cloud and hybrid-cloud initiatives. They want to make sure that their application can run on any cloud or on the same software can run on their colos like Equinix, or like bunch of, like, Digital Reality, anywhere.
And MinIO’s software, this is what we set out to do. MinIO can run anywhere inside the cloud, all the way to the edge, even on Raspberry Pi. It’s now—whatever we started with is now has become reality; the timing is perfect for us.
Corey: One of the challenges I’ve always had with the idea of building an application with the idea to run it anywhere is you can make explicit technology choices around that, and for example, object store is a great example because most places you go now will or can have an object store available for your use. But there seem to be implementation details that get lost. And for example, even load balancers wind up being implemented in different ways with different scaling times and whatnot in various environments. And past a certain point, it’s okay, we’re just going to have to run it ourselves on top of HAproxy or Nginx, or something like it, running in containers themselves; you’re reinventing the wheel. Where is that boundary between, we’re going to build this in a way that we can run anywhere and the reality that I keep running into, which is we tried to do that but we implicitly without realizing it built in a lot of assumptions that everything would look just like this environment that we started off in.
That’s actually not the problem. The problem comes when you have multiple clouds. Different teams, like, part M&A, the part—like they—even if you don’t do M&A, different teams, no two data engineer will would agree on the same software stack. Then where they will all end up with different cloud players and some is still running on old legacy environment.
When you combine them, the problem is, like, let’s take just the cloud, right? How do I even apply a policy, that access control policy, how do I establish unified identity? Because I want to know this application is the only one who is allowed to access this bucket. Can I have that same policy on Google Cloud or Azure, even though they are different teams? Like if that employer, that project, or that admin, if he or she leaves the job, how do I make sure that that’s all protected?
You want unified identity, you want unified access control policies. Where are the encryption key store? And then the load balancer itself, the load, its—load balancer is not the problem. But then unless you adopt S3 API as your standard, the definition of what a bucket is different from Microsoft to Google to Amazon.
Corey: Yeah, the idea of an of the PUTS and retrieving of actual data is one thing, but then you have how do you manage it the control plane layer of the object store and how do you rationalize that? What are the naming conventions? How do you address it? I even ran into something similar somewhat recently when I was doing an experiment with one of the Amazon Snowball edge devices to move some data into S3 on a lark. And the thing shows up and presents itself on the local network as an S3 endpoint, but none of their tooling can accept a different endpoint built into the configuration files; you have to explicitly use it as an environment variable or as a parameter on every invocation of something that talks to it, which is incredibly annoying.
I would give a lot for just to be able to say, oh, when you’re talking in this profile, that’s always going to be your S3 endpoint. Go. But no, of course not. Because that would make it easier to use something that wasn’t them, so why would they ever be incentivized to bake that in?
AB: Yeah. Snowball is an important element to move data, right? That’s the UPS and FedEx way of moving data, but what I find customers doing is they actually use the tools that we built for MinIO because the Snowball appliance also looks like S3 API-compatible object store. And in fact, like, I’ve been told that, like, when you want to ship multiple Snowball appliances, they actually put MinIO to make it look like one unit because MinIO can erase your code objects across multiple Snowball appliances. And the MC tool, unlike AWS CLI, which is really meant for developers, like low-level calls, MC gives you unique [scoring 00:21:08] tools, like lscp, rsync-like tools, and it’s easy to move and copy and migrate data. Actually, that’s how people deal with it.
Corey: Oh, God. I hadn’t even considered the problem of having a fleet of Snowball edges here that you’re trying to do a mass data migration on, which is basically how you move petabyte-scale data, is a whole bunch of parallelism. But having to figure that out on a case-by-case basis would be nightmarish. That’s right, there is no good way to wind up doing that natively.
AB: Yeah. In fact, Western Digital and a few other players, too, now the Western Digital created a Snowball-like appliance and they put MinIO on it. And they are actually working with some system integrators to help customers move lots of data. But Snowball-like functionality is important and more and more customers who need it.
Corey: This episode is sponsored in part by Honeycomb. I’m not going to dance around the problem. Your. Engineers. Are. Burned. Out. They’re tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who’s There? It’s Nagios, the original call of duty! They’re fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there’s a better way. Observability tools like Honeycomb (and very little else because they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It’s great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That’s honeycomb.io/screaminginthecloud.
Corey: Increasingly, it felt like, back in the on-prem days, that you’d have a file server somewhere that was either a SAN or it was going to be a NAS. The question was only whether it presented it to various things as a volume or as a file share. And then in cloud, the default storage mechanism, unquestionably, was object store. And now we’re starting to see it come back again. So, it started to increasingly feel, in a lot of ways, like Cloud is no longer so much a place that is somewhere else, but instead much more of an operating model for how you wind up addressing things.
I’m wondering when the generation of prosumer networking equipment, for example, is going to say, “Oh, and send these logs over to what object store?” Because right now, it’s still write a file and SFTP it somewhere else, at least the good ones; some of the crap ones still want old unencrypted FTP, which is neither here nor there. But I feel like it’s coming back around again. Like, when do even home users wind up instead of where do you save this file to having the cloud abstraction, which hopefully, you’ll never have to deal with an S3-style endpoint, but that can underpin an awful lot of things. It feels like it’s coming back and that’s cloud is the de facto way of thinking about things. Is that what you’re seeing? Does that align with your belief on this?
AB: I actually, fundamentally believe in the long run, right, applications will go SaaS, right? Like, if you remember the days that you used to install QuickBooks and ACT and stuff, like, on your data center, you used to run your own Exchange servers, like, those days are gone. I think these applications will become SaaS. But then the infrastructure building blocks for these SaaS, whether they are cloud or their own colo, I think that in the long run, it will be multi-cloud and colo all combined and all of them will look alike.
But what I find from the customer’s journey, the Old World and the New World is incompatible. When they shifted from bare metal to virtualization, they didn’t have to rewrite their application. But this time, you have—it as a tectonic shift. Every single application, you have to rewrite. If you retrofit your application into the cloud, bad idea, right? It’s going to cost you more and I would rather not do it.
Even though cloud players are trying to make, like, the file and block, like, file system services [unintelligible 00:24:01] and stuff, they make it available ten times more expensive than object, but it’s just to [integrate 00:24:07] some legacy applications, but it’s still a bad idea to just move legacy applications there. But what I’m finding is that the cost, if you still run your infrastructure with enterprise IT mindset, you’re out of luck. It’s going to be super expensive and you’re going to be left out modern infrastructure, because of the scale, it has to be treated as code. You have to run infrastructure with software engineers. And this cultural shift has to happen.
And that’s why cloud, in the long run, everyone will look like AWS and we always said that and it’s now being becoming true. Like, Kubernetes and MinIO basically is leveling the ground everywhere. It’s giving ECS and S3-like infrastructure inside AWS or outside AWS, everywhere. But what I find the challenging part is the cultural mindset. If they still have the old cultural mindset and if they want to adopt cloud, it’s not going to work.
You have to change the DNA, the culture, the mindset, everything. The best way to do it is go to the cloud-first. Adopt it, modernize your application, learn how to run and manage infrastructure, then ask economics question, the unit economics. Then you will find the answers yourself.
Corey: On some level, that is the path forward. I feel like there’s just a very long tail of systems that have been working and have been meeting the business objective. And well, we should go and refactor this because, I don’t know, a couple of folks on a podcast said we should isn’t the most compelling business case for doing a lot of it. It feels like these things sort of sit there until there is more upside than just cost-cutting to changing the way these things are built and run. That’s the reason that people have been talking about getting off of mainframe since the ’90s in some companies, and the mainframe is very much still there. It is so ingrained in the way that they do business, they have to rethink a lot of the architectural things that have sprung up around it.
I’m not trying to shame anyone for the [laugh] state that their environment is in. I’ve never yet met a company that was super proud of its internal infrastructure. Everyone’s always apologizing because it’s a fire. But they think someone else has figured this out somewhere and it all runs perfectly. I don’t think it exists.
AB: What I am finding is that if you are running it the enterprise IT style, you are the one telling the application developers, here you go, you have this many VMs and then you have, like, a VMware license and, like, Jboss, like WebLogic, and like a SQL Server license, now you go build your application, you won’t be able to do it. Because application developers talk about Kafka and Redis and like Kubernetes, they don’t speak the same language. And that’s when these developers go to the cloud and then finish their application, take it live from zero lines of code before it can procure infrastructure and provision it to these guys. The change that has to happen is how can you give what the developers want now that reverse journey is also starting. In the long run, everything will look alike, but what I’m finding is if you’re running enterprise IT infrastructure, traditional infrastructure, they are ashamed of talking about it.
But then you go to the cloud and then at scale, some parts of it, you want to move for—now you really know why you want to move. For economic reasons, like, particularly the data-intensive workloads becomes very expensive. And at that part, they go to a colo, but leave the applications on the cloud. So, it’s the multi-cloud model, I think, is inevitable. The expensive pieces that where you can—if you are looking at yourself as hyperscaler and if your data is growing, if your business focus is data-centric business, parts of the data and data analytics, ML workloads will actually go out, if you’re looking at unit economics. If all you are focused on productivity, stick to the cloud and you’re still better off.
Corey: I think that’s a divide that gets lost sometimes. When people say, “Oh, we’re going to move to the cloud to save money.” It’s, “No you’re not.” At a five-year time horizon, I would be astonished if that juice were worth the squeeze in almost any scenario. The reason you go for therefore is for a capability story when it’s right for you.
That also means that steady-state workloads that are well understood can often be run more economically in a place that is not the cloud. Everyone thinks for some reason that I tend to be its cloud or it’s trash. No, I’m a big fan of doing things that are sensible and cloud is not the right answer for every workload under the sun. Conversely, when someone says, “Oh, I’m building a new e-commerce store,” or whatnot, “And I’ve decided cloud is not for me.” It’s, “Ehh, you sure about that?”
That sounds like you are smack-dab in the middle of the cloud use case. But all these things wind up acting as constraints and strategic objectives. And technology and single-vendor answers are rarely going to be a panacea the way that their sales teams say that they will.
AB: Yeah. And I find, like, organizations that have SREs, DevOps, and software engineers running the infrastructure, they actually are ready to go multi-cloud or go to colo because they have the—exactly know. They have the containers and Kubernetes microservices expertise. If you are still on a traditional SAN, NAS, and VM architecture, go to cloud, rewrite your application.
Corey: I think there’s a misunderstanding in the ecosystem around what cloud repatriation actually looks like. Everyone claims it doesn’t exist because there’s basically no companies out there worth mentioning that are, “Yep, we’ve decided the cloud is terrible, we’re taking everything out and we are going to data centers. The end.” In practice, it’s individual workloads that do not make sense in the cloud. Sometimes just the back-of-the-envelope analysis means it’s not going to work out, other times during proof of concepts, and other times, as things have hit a certain point of scale, we’re in an individual workload being pulled back makes an awful lot of sense. But everything else is probably going to stay in the cloud and these companies don’t want to wind up antagonizing the cloud providers by talking about it in public. But that model is very real.
AB: Absolutely. Actually, what we are finding with the application side, like, parts of their overall ecosystem, right, within the company, they run on the cloud, but the data side, some of the examples, like, these are in the range of 100 to 500 petabytes. The 500-petabyte customer actually started at 500 petabytes and their plan is to go at exascale. And they are actually doing repatriation because for them, their customers, it’s consumer-facing and it’s extremely price sensitive, but when you’re a consumer-facing, every dollar you spend counts. And if you don’t do it at scale, it matters a lot, right? It will kill the business.
Particularly last two years, the cost part became an important element in their infrastructure, they knew exactly what they want. They are thinking of themselves as hyperscalers. They get commodity—the same hardware, right, just a server with a bunch of [unintelligible 00:30:35] and network and put it on colo or even lease these boxes, they know what their demand is. Even at ten petabytes, the economics starts impacting. If you’re processing it, the data side, we have several customers now moving to colo from cloud and this is the range we are talking about.
They don’t talk about it publicly because sometimes, like, you don’t want to be anti-cloud, but I think for them, they’re also not anti-cloud. They don’t want to leave the cloud. The completely leaving the cloud, it’s a different story. That’s not the case. Applications stay there. Data lakes, data infrastructure, object store, particularly if it goes to a colo.
Now, your applications from all the clouds can access this centralized—centralized, meaning that one object store you run on colo and the colos themselves have worldwide data centers. So, you can keep the data infrastructure in a colo, but applications can run on any cloud, some of them, surprisingly, that they have global customer base. And not all of them are cloud. Sometimes like some applications itself, if you ask what type of edge devices they are running, edge data centers, they said, it’s a mix of everything. What really matters is not the infrastructure. Infrastructure in the end is CPU, network, and drive. It’s a commodity. It’s really the software stack, you want to make sure that it’s containerized and easy to deploy, roll out updates, you have to learn the Facebook-Google style running SaaS business. That change is coming.
Corey: It’s a matter of time and it’s a matter of inevitability. Now, nothing ever stays the same. Everything always inherently changes in the full sweep of things, but I’m pretty happy with where I see the industry going these days. I want to start seeing a little bit less centralization around one or two big companies, but I am confident that we’re starting to see an awareness of doing these things for the right reason more broadly permeating.
AB: Right. Like, the competition is always great for customers. They get to benefit from it. So, the decentralization is a path to bringing—like, commoditizing the infrastructure. I think the bigger picture for me, what I’m particularly happy is, for a long time we carried industry baggage in the infrastructure space.
If no one wants to change, no one wants to rewrite application. As part of the equation, we carried the, like, POSIX baggage, like SAN and NAS. You can’t even do [unintelligible 00:32:48] as a Service, NFS as a Service. It’s too much of a baggage. All of that is getting thrown out. Like, the cloud players be helped the customers start with a clean slate. I think to me, that’s the biggest advantage. And that now we have a clean slate, we can now go on a whole new evolution of the stack, keeping it simpler and everyone can benefit from this change.
Corey: Before we wind up calling this an episode, I do have one last question for you. As I mentioned at the start, you’re very much open-source, as in legitimate open-source, which means that anyone who wants to can grab an implementation and start running it. How do you, I guess make peace with the fact that the majority of your user base is not paying you? And I guess how do you get people to decide, “You know what? We like the cut of his jib. Let’s give him some money.”
AB: Mm-hm. Yeah, if I looked at it that way, right, I have both the [unintelligible 00:33:38], right, on the open-source side as well as the business. But I don’t see them to be conflicting. If I run as a charity, right, like, I take donation. If you love the product, here is the donation box, then that doesn’t work at all, right?
I shouldn’t take investor money and I shouldn’t have a team because I have a job to pay their bills, too. But I actually find open-source to be incredibly beneficial. For me, it’s about delivering value to the customer. If you pay me $5, I ought to make you feel $50 worth of value. The same software you would buy from a proprietary vendor, why would—if I’m a customer, same software equal in functionality, if its proprietary, I would actually prefer open-source and pay even more.
But why are, really, customers paying me now and what’s our view on open-source? I’m actually the free software guy. Free software and open-source are actually not exactly equal, right? We are the purest of the open-source community and we have strong views on what open-source means, right. That’s why we call it free software. And free here means freedom, right? Free does not mean gratis, that free of cost. It’s actually about freedom and I deeply care about it.
For me it’s a philosophy and it’s a way of life. That’s why I don’t believe in open core and other models that holding—giving crippleware is not open-source, right? I give you some freedom but not all, right, like, it’s it breaks the spirit. So, MinIO is a hundred percent open-source, but it’s open-source for the open-source community. We did not take some community-developed code and then added commercial support on top.
We built the product, we believed in open-source, we still believe and we will always believe. Because of that, we open-sourced our work. And it’s open-source for the open-source community. And as you build applications that—like the AGPL license on the derivative works, they have to be compatible with AGPL because we are the creator. If you cannot open-source, you open-source your application derivative works, you can buy a commercial license from us. We are the creator, we can give you a dual license. That’s how the business model works.
That way, the open-source community completely benefits. And it’s about the software freedom. There are customers, for them, open-source is good thing and they want to pay because it’s open-source. There are some customers that they want to pay because they can’t open-source their application and derivative works, so they pay. It’s a happy medium; that way I actually find open-source to be incredibly beneficial.
Open-source gave us that trust, like, more than adoption rate. It’s not like free to download and use. More than that, the customers that matter, the community that matters because they can see the code and they can see everything we did, it’s not because I said so, marketing and sales, you believe them, whatever they say. You download the product, experience it and fall in love with it, and then when it becomes an important part of your business, that’s when they engage with us because they talk about license compatibility and data loss or a data breach, all that becomes important. Open-source isn’t—I don’t see that to be conflicting for business. It actually is incredibly helpful. And customers see that value in the end.
Corey: I really want to thank you for being so generous with your time. If people want to learn more, where should they go?
AB: I was on Twitter and now I think I’m spending more time on, maybe, LinkedIn. I think if they—they can send me a request and then we can chat. And I’m always, like, spending time with other entrepreneurs, architects, and engineers, sharing what I learned, what I know, and learning from them. There is also a [community open channel 00:37:04]. And just send me a mail at [email protected] and I’m always interested in talking to our user base.
Corey: And we will, of course, put links to that in the [show notes 00:37:12]. Thank you so much for your time. I appreciate it.
AB: It’s wonderful to be here.
Corey: AB Periasamy, CEO and co-founder of MinIO. I’m Cloud Economist Corey Quinn and this has been a promoted guest episode of Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice that presumably will also include an angry, loud comment that we can access from anywhere because of shared APIs.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.