- InfluxData: https://www.influxdata.com
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: This episode is sponsored in part by my friends at ThinkstCanary. Most companies find out way too late that they’ve been breached. ThinksCanary changes this and I love how they do it. Deploy canaries and canary tokens in minutes and then forget about them. What's great is the attackers tip their hand by touching them, giving you one alert, when it matters. I use it myself and I only remember this when I get the weekly update with a “we’re still here, so you’re aware” from them. It’s glorious! There is zero admin overhead to this, there are effectively no false positives unless I do something foolish. Canaries are deployed and loved on all seven continents. You can check out what people are saying at canary.love. And, their Kub config canary token is new and completely free as well. You can do an awful lot without paying them a dime, which is one of the things I love about them. It is useful stuff and not an, “ohh, I wish I had money.” It is speculator! Take a look; that’s canary.love because it's genuinely rare to find a security product that people talk about in terms of love. It really is a unique thing to see. Canary.love. Thank you to ThinkstCanary for their support of my ridiculous, ridiculous nonsense.
Brian: [laugh]. Well, thanks, Corey, very excited to be here. And yes, dealmaker; I guess that would be apropos. How did I get into marketing? Well, a lot of my career is spent in business development, and so I think that’s where the dealmaker part comes from.
Several different roles, including my first role at Influx—when I joined Influx—was in business development and partnerships. And so, prior to coming to Influx, I spent many years building out the business development team at Twilio, growing that up, and we did a lot of deals with carriers, with Cloud partners, with all kinds of different partners; you name it, we worked with them. And then moving into Influx, joined in an BD capacity here and had a couple different roles that eventually evolved to Chief Marketing Officer. But that’s where the dealmaker comes from. I like to do deals, it’s always nice to have one on the side in whatever capacity you’re working in, it’s nice to have a deal or two working on the side. It kind of keeps you fresh.
Corey: It’s fun because people think, “Oh, a deal. You’re thinking of mergers and acquisitions, and how hard could that be? You just show up with a bag of money and give it to people and then you have a deal closed.” And oh, if only it were that simple. Every client engagement we have on the consulting side has been a negotiation back and forth, and the idea is to ideally get everyone to the point where they’re happy, but honestly, if everyone’s slightly unhappy but can live with the result, we’ll take that too.
Brian: That’s a good point. And actually that wording that you described of finding a win for everybody, that’s how I always thought about it. I think about it as first of all, you’re trying to understand what the other party—and it could be an individual, it could be a company, it could be a group of companies, sometimes—you’re trying to understand what their goals are, what their agenda is and see how that matches with your own; sometimes they’re opposing, sometimes they’re overlapping. And then everyone has to have some perceived win in a deal. And it’s not competitively; it’s more like you just have to have value, that is kind of what the win is – having value in that deal.
And so that’s the way I always approached it. And doing deals, whether you’re in BD or sales, or if you’re working with vendors and you’re in a different functional role, sometimes it’s not even commercial, it’s just about aligning resources, perhaps. Our deal might be that you and I are both going to put a collective effort into building something or taking something to market. In another scenario might be like, I’m going to pay for this service that you’re delivering, or vice versa. Or we’re going to go and bring two revenue-generating products together and take them to market. Whatever it might be, it doesn’t matter so much what the mechanics are of the deal, but it’s usually about aligning those agendas and in having someone get utility, get value on the other side.
Corey: I think that people lose sight of the fact as well, that when you’re talking about a service provider—and let’s be clear, InfluxData has launched a cloud platform that we’ll talk about in a minute—this is not the one-off transactional relationship; once the deal is signed, you’ve got to work with these people. When they host parts of your production infrastructure, whether you want to admit it or not they’re your partner more so than they are your vendor. It has to be an ongoing relationship that people are, if they at least aren’t thrilled with it, can at least be happy enough to live with, otherwise it just winds up with this growing sense of resentment and it just sort of leads nowhere.
Brian: Yeah, there really is no deal moment. Yes, people sign agreements with companies, but that’s just the very beginning. Your relationship evolves from there. We’re delivering a product, we’re delivering this platform that handles time-series data to our customers, and we’re asking them to trust us with their product that they’re taking out to market. They’re asking us to handle their data and to deliver service to them that they’re turning into their production applications. And so it’s a big responsibility. And so we care about the relationship with our customers to continue that.
Corey: So, I first really became aware of time-series data a few years back during a re:Invent keynote when they pre-announced Timestream, which took entirely too long to come to market. Okay, great. So, you’re talking about time-series data. Can you explain what that means in simple terms? And I learned over the next eight minutes that they were talking about it, that no, no, they couldn’t. I wound up more confused by the end of the announcement than I was at the beginning.
So, assuming that I have the same respect for databases as you would expect for someone whose favorite data store is Route 53—because you can misuse it as a beautiful database—what is time-series data and why does it matter in 2021?
Brian: Sure, it's a good question. And I was there in that audience as well that day. So, we think of time-series data as really any type of data that’s stamped in time, in some way. It could be every hour, every minute, every second, every half second, whatever. But more specifically, it’s any type of data that is generated by some source—and that could be a sensor sources within systems or an actual application—and these things change over time, and then therefore, stamped in time in some way.
They can come at different frequencies, like I said, from nanoseconds to seconds, or minutes and hours, but the most important thing is that they usually trigger a workflow, trigger some sort of action. And so that’s really what our platform is about. It allows people to handle this type of data and then work with it from there in their applications, trigger new workflows, et cetera. Because the historical context of what happens is super important.
And when we talk about sources, it could be really many things. It could be in physical spaces, and we have a lot of IoT types of customers and use cases. And those are things like devices and sensors on the factory floor, out in the field, it’s on a vehicle. It’s even in space, believe it or not. There are customers that are using us on satellites.
And then it can also be sources from within software, applications, and infrastructure, things like VMs, and containers, and microservices, all emitting time-series data. And it could be applications like crypto, or financial, or stock market, agricultural type of applications that are themselves as applications emitting data. So, you think about all these sources that are out there from the physical world to the virtual world, and they’re all generating time-series data, and our platform is really specially designed to handle that kind of data. And we can get into some details of what exactly that means, but that’s really why we’re here. That’s what time-series is all about.
Corey: And this is the inherent challenge I think we’re seeing across the entire industry slash ecosystem. I mean, this is airing during re:Invent week, but at the time we are recording this, we have not yet seen the Tuesday keynote that Adam Selipsky will take to the stage, and no doubt, render the stat I’m about to throw at you completely obsolete. But depending on how you count them, there’s somewhere between 13 and 15 managed database or database-like services today that AWS offers. And they never turn things off and they’re always releasing new things, supposedly on behalf of customers; in practice because someone somewhere wants to get promoted by launching a new service; good for them. Godspeed.
If we look into the uncertain future, at some point, someone’s job is going to be disambiguating between the 40 different managed database services that AWS offers and picking the one that works. What differentiates time-series from—let’s just start with an easy one—something like MySQL or Postgres—or ‘Postgres-squeal’ is how I insist on pronouncing that one. Let’s stay away from things like Neptune because no one knows what a social graph database is and I assure you, you almost certainly don’t need one. Where does something like Influx work in a way that, “Huh. Running this on MySQL is really starting to suck.”
Brian: When and why is it time to consider a specialized tool. And in fact, that’s actually what we see a lot with our customers is coming to us around that time when a time-series is a problem to solve for them is reaching the point where they really need a specialized tool that’s kind of built for that. And so one way to look at that is really just to think about time-series in general as a type of data. It’s rapidly rising. It’s the fastest growing data category out there right now.
And the reason for that is it’s being driven by two big macro trends. One is the explosion of all these applications and services running in the cloud. They’re expanding horizontally, they’re running in more regions, they’re in many cases running on multiple clouds, and so it’s just getting big—the workloads are getting bigger and bigger. And those are emitting time-series data. And then simultaneously, you have this growth of all these devices and sensors that are coming online out in the real world: batteries, and temperature gauges, and all kinds of stuff, both new and old, that is coming online, and those sources are generating a lot of time-series data.
So typically, we’re in a moment now, where a lot of developers are faced with this massive growth of time-series data. And if you think about some data set that you have, that you’re putting into some kind of traditional database, now add the component of time as a multiplier by all the data you have. Instead of that one data, that one metric, you’re now looking at doing that every one second in perpetuity. And so it’s just an order of magnitude more data that you’re dealing with. And then you also have this notion of—when you have that magnitude of data, you have fidelity, you’re taking a lot of it in at the same time, I mean, very quickly, so you have batch or stream data coming in at super high volume, and you may need that for a few minutes or a few hours or days, but maybe you don’t need it for months and years.
And so you’d maybe dropped down to kind of a lower fidelity for the longer-term. But you really have this toggling back and forth of the high fidelity and low fidelity, all coming at you at pretty high volume. And so typically what happens is, is when the workloads get big enough, the legacy tools, they’re just not equipped to do it. And a developer—if they have a small set of time-series they’re dealing with, what is the first thing they’re going to do? They’re going to look around and be like, “Hey, what do I have here? Oh, I’ve got Mongo over here. I’ve got Splunk, or I’ve got this old relational database, I can put it in.”
And that’s typically what they’ll do, and that works fine until it doesn’t. And then that’s when they come around looking for a specialized tool. So, we really sit in Influx and, frankly, other time-series products really do sit at that point where people are considering a specialized tool just because the workload has gotten such that it requires that.
Corey: Yeah. Taking a look at most of the offerings in the space; anything that winds up charging anything more than a very tiny fraction of a penny—from what you’re describing—is going to quickly become non-economical, where it’s, “Oh, we’re going to charge you”—like using S3: every, I think, 1000 writes cost a penny—“Oh, we’re just going to use S3 for this.” Well, at some of these data volumes, that means that your request charge on S3 is very quickly going to become the largest single line item in your bill, which is nothing short of impressive in a lot of cases, but it also probably means that you’ve taken a very specific tool—like an iPad—and tried to use it as something else—like a hammer—and no one’s particularly happy with that outcome.
Brian: Yeah. First of all, having usage-based pricing is really important. We think about it as allowing people to have the full version of the product without a major commitment, and be using it in test scenarios and then later in the very early production scenarios. But as a principle, it’s important for people that just signed up two hours ago using your product are basically using the same full product that the biggest customers that you have are using that are paying many, many thousands or tens of thousands per month. And so the way to do that is to offer usage-based pricing and not force people to commit to something before they’re ready to do it.
And so there’s ways to unlock lower pricing, and we, like a lot of companies, offer annual pricing and we have a sales team that worked with folks to basically draw down their unit costs on the use of the platform once they kind of get comfortable with their workload. So, there’s definitely avenues to get lower price, and we’re believers in that. And we also want to, from a product development perspective, try to make the product more efficient. And so we basically are trying to drive down the costs through efficiencies in the product: make it run faster, make queries take less time, and also ship products on top of it that require developers to write less code themselves, kind of, do more of the work for them.
Corey: One of the things I find particularly compelling about what you’ve done is it is an open-source project. If I want to go ahead and run some time-series experiments myself, I can spin it up anywhere I want and run it however I see fit. Now, at some point, if I’m doing this for anything more than, “Oh, let’s see how I can misuse this today,” I probably want to at least consider letting someone who’s better at running these things than I am take it over. And as I’m looking through your customer list, the thing that strikes me is how none of these things are quite like the other. We’re talking about companies like Hulu is probably not using it the same way as Capital One is, at least I certainly hope not. You have Texas Instruments; you also have Adobe. And it sort of runs an entire gamut of none of these companies quite look alike; I have to imagine their use cases are also somewhat varied, too.
Brian: Yeah, that’s right. And we really do see as a platform, and with time-series being the common problem that people are looking to solve, we see this pretty broad set of use cases and customer types. And we have some more traditional customers like the Cisco’s and the IBM’s of the world, and then some relatively new folks like Tesla and Hulu and others that are a little bit more recent. But they’re all trying to solve the same fundamental problem with time-series, which is “How can I handle it in an efficient way and make use of it meaningfully in my applications and services?”
And we were talking earlier about having some sources of time-series data being in, kind of a virtual space, like in infrastructure and software, and then some being in physical space, like in devices and sensors out in the real world. So, we have breadth in that way, too. We have folks who are building big software observability infrastructure solutions on us, and we also have people that are pulling data off of the devices on a solar panel that’s sitting on a house in the emerging world, right? So, you have basically these two far ends of the spectrum, but all using this specialized tool to handle the time-series data that they’re generating.
Corey: It seems to me that for most of these use cases and the way you describe it, it’s more about the overall shape of the data when we’re talking about time-series more so than it is any particular data point in isolation. Is that accurate, or are there cases where that is very much not the case?
Brian: I think that’s accurate. What people are mostly trying to understand is context for what’s happening. And so it’s not necessarily—to your point—not searching for one specific data point or moment, but it’s really understanding context for some general state that has changed or some trend that has emerged, whatever that might be, and then making sense of that, and then taking action on that. And taking an action could mean a couple of different things, too. It could be in an observability sense, where somebody in an operator type of mode where they’re looking at dashboards and paying attention to infrastructure that’s running and then need to take some sort of action based on that. It also, in many cases, is automated in some way: it’s either some series of automated responses to some state that is reached that is visible in the data, or is actually kicking off some new series of tasks or actions inside of an application based on what is occurring and shown by the time-series data.
Corey: You know what doesn’t add to your AWS bill? Free developer security from Snyk. Snyk is a frictionless security platform that meets developers where they are, finding and fixing vulnerabilities right from the CLI, IDEs, repos, and pipelines. And Snyk integrates seamlessly with AWS offerings like CodePipeline, EKS, ECR, and oh so much more.
Corey: So, we’ve talked about, you have an open-source product, which is the sort of thing that most people listening to this should have a vague idea of, “Oh, that means I can go on GitHub and download it and start using it, if it’s not already in my package manager.” Great. You also have the enterprise offering, which is more or less, I presume, a supported distribution of this—for lack of a better term—that you then wind up providing blessed configurations thereof and helping run support for that—for companies that want to run it on-prem. Is that directionally accurate, or am I grossly mischaracterizing [laugh] what your enterprise offering is?
Brian: Yes, we are trying to bring the transparency back. But yes, you’re correct. We have open-source and we have—it’s very popular—we have over 500,000-plus instances of that deployed globally today in the community. And that’s typically very common for developers to get started using the open-source, easily recognizable, it’s been out for a long time, and so many people start the journey there.
And then we have InfluxDB Enterprise, which it’s actually a clustered version of InfluxDB open-source. So, it allows you to basically handle in an environment that you want to manage yourself, you manage a cluster and scale it out and handle ever-increasing workloads and have things like redundancy and replication, et cetera. But that’s really specifically for people who want to deploy and operate the software themselves, which is a good set of people; we have a lot of folks who have done that. But one of the areas that’s a little bit more recent is InfluxDB Cloud, which is really, for folks who don’t want to have anything to do with the management; they really just want to use it as a service, send their data in—
Brian: Exactly. That’s our job. And increasingly, we’ve seen folks gravitate to that. We’ve got a lot of folks have signed up on this product since it launched in 2019, and it’s really increasingly where they begin their journey, maybe not even going to the open-source just going directly to this because it’s relatively simple to get started.
It’s priced based on usage. People pay for three vectors: they have the amount of data in; they have number of queries made against the platform; and then storage, how much data you have and for how long. And depending on the use case, some people keep it around for relatively short time, like a few days or a couple of weeks. Other folks have it for many, many months and potentially years in some places. So, you really have that option.
But I would say the three products are really about how you want to run it. Do you care about running the, kind of, underlying infrastructure and managing it or do you just want to hit an endpoint, as you said.
Corey: You launched this, I want to say in 2019, which feels about directionally right. And I know it was after Timestream was announced, so I just want to say first, how kind and selfless it was of you to validate AWS’s market, which is, you know how they always like to clarify and define what they’re doing when they decide to enter every single market anywhere to compete with everyone. It turns out, I don’t get the sense that they like it quite [laugh] as much being on the other side of that particular divide, but that’s the best kind of problem, too: again, someone else’s.
Brian: Yeah, I think that’s really true.
Corey: The challenge that I have is that it seems like a weird direction to go in as a company, though it is clearly based upon a number of press releases you have made about the success and market traction that you found, it feels, on some level, like it is falling into an older version of an open-source trap of assuming that, “Well, we wrote the software therefore we are the best people you could pick to run it.” That was what a lot of companies did; it turns out that AWS has this operational excellence, as they call it, and what the rest of us call burning through people and making them wake up in the middle of the night to fix things before it becomes customer-visible. But from the outside, there’s no difference. It seems, however, that you have built something that is clearly resonating, and in a big way, in a way that—I’ve got to be direct with you—the AWS time-series service that they are offering has not been finding success.
Brian: Thank you for saying that, and we feel pretty excited about the success we’ve had even being in the same market as Amazon. And Amazon does a phenomenal job at running products at scale, and the breadth that they have in their product lineup is pretty impressive, especially when they roll out new stuff at AWS re:Invent every year. But we’ve been able to find some pretty good success with our approach, and it’s based on a couple of things. So, one is being the company that actually develops and still deploys the open-source is really important. People gravitate to that.
Our roots as a company are open-source, we’ve been a part of and fostered this community over many, many years, and there’s a certain trust in the direction that we’re taking the company. And Paul, our founder who you mentioned, he’s been front and center with that community, pretty deeply engaged for many, many years. I think that carries a lot of weight. At least that’s the way we think about it. But then as far as commercial products go, we really think about it as going to where our customers are, going to where developers are. And that could mean the language that they prefer, the language of preference for them. And that could [crosstalk 00:22:25]—
Corey: Oh, and it’s very clear; it seems that most database companies that I talk to—again, without naming names—tend to focus on the top-down sale, but I’ve never worked in an environment where the database that will be used was dictated by anyone other than the application developers who are the closest to the technical requirements for the workload. I’ve never understood this model of, “Oh, we’re going to talk to the C suite because we believe that they’re going to pick a database vendor based upon who has box seats this season.” I’ve never gotten that and that probably means I’m a terrible enterprise marketer, on some level. But unlike almost every other player in the database space, I’ve never struggled to understand what the hell your messaging has meant, other than the technical bits that I just don’t have quite enough neurons to bang together to create sparks to fully understand. It is very clearly targeted at a builder rather than someone who’s more or less spending their entire life in meetings. Which, oh, God, that’s me.
And so we care about going to where those developers are, and that could mean going and making your product easily used in the language and tool that customer cares about. So, if you’re a Python developer, it’s important for us to have tools and make it easy for Python developers. We have client libraries for Python, for example. It also means going to the cloud where your customers are. And this is something that differentiates us as well, when you start looking at what the other cloud providers are offering, in that data—like it or not—has gravity. And so somebody that has built their whole stack on AWS and sure they care about using a service that is going to receive their data, and that also being in AWS, but—
Corey: It has to live where the customers are, especially with data egress charges being what they are, too.
Corey: And data gravity is real. The cloud provider people pick is the one where their data lives because of that particular inflection in the market.
Brian: Absolutely true. And so that’s great if you’re only going after people who are on AWS, but what about Google Cloud and what about Microsoft Azure? There are a lot of developers that are building on those platforms as well, and that’s one of the reasons we want to go there as well. So, InfluxDB Cloud is a multi-cloud offering, and it’s equal experience and capability and pricing on each of the three major clouds. You can buy directly from us; you can put it on any of your cloud bills in one of those marketplaces, and to us that’s like a really, really fundamental point is to bring your product and make it as easy to use on those platforms and in those languages, and in those realms and use cases where people are already working.
Corey: I’m a big believer in multi-cloud for the use case you just defined. Because I know I’m going to get letters if I don’t say this based upon my public multi-cloud is a dumb default worst practice for most folks—because it is, on a workload-by-workload basis—but you’re building a service that has to be close to where your customers are and for that specific thing, yeah, it makes an awful lot of sense for you to have a presence across all the different providers. Now, here’s the $64,000 question for you: is the experience as an InfluxDB Cloud customer meaningfully different between different providers?
Brian: It’s not. We actually pride ourselves on it being the same. Using InfluxDB, you sign up for InfluxDB Cloud, you come in, you set up your account, create your organization, and then you choose which underlying cloud provider you want your account to be provisioned in. And so it actually comes as a secondary choice; it’s not something that is gated in the beginning, and that allows us to deliver a uniform experience across the board. And you may in a future use case, maybe somebody wants to have part of what they’re building data living in AWS and maybe part of it living in Azure, I mean, that could be a scenario as well.
However, typically what we’ve seen—and you’ve probably seen this as well—is most developers are—and organizations—are building mostly on one cloud. I don’t see a lot of multi-cloud in that organization. But we ourselves need to be multi-cloud in order to go to where those people are working. And so that’s the distinction. It’s for us as a company that delivers product to those people, it’s important for us to go where they are, whereas they themselves are not necessarily running on all three cloud products; they’re probably running on one platform.
Corey: Yeah. On a workload-by-workload basis, that’s what generally makes sense. Anytime you have someone who has a particular workload that needs to be in multiple providers, okay, great, you’re going to put that out there, but their backend systems, their billing, their marketing, all the rest, is not going to go down that path for a variety of excellent reasons, mostly that it is a colossal pain, and a bunch of, more or less, solving the same problems over and over, rather than the whole point of cloud being to make it someone else’s. I want to thank you for taking so much time to speak to me about how you’re viewing the evolution of the market, how you’re seeing your move into cloud, and how you’re effectively targeting folks who can actually care about the implementation details of a database rather than, honestly, suits. If people want to learn more, where can they find you?
Brian: They can go to our website; it’s the easiest place to go. So, influxdata.com. You can read all about InfluxDB, it’s a pretty easy sign up to get underway. So, I recommend that people get their hands dirty with the product. That’s the easiest way to understand what it’s all about.
Corey: And if you do end up doing that, please tell them I sent you because the involuntary flinch whenever people mention my name to vendors is one of my favorite parts of being me. Brian, thank you so much for being so generous with your time. I appreciate it.
Brian: Thanks so much for having us on. It was great.
Corey: Brian Mullen, Chief Marketing Officer—and dealmaker—at InfluxData. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a long, angry comment telling me that you work on the Timestream service team, and your product is the best. It’s found huge success, but I’ve just never met any of your customers and I can’t because they all live in Canada.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Announcer: This has been a HumblePod production. Stay humble.