Cribl Sharpens the Security Edge with Clint Sharp

Episode Summary

Clint Sharp, CEO and co-founder at Cribl, is back for a repetition of “Screaming!” This time Clint is here with some news! But, it isn’t to buy a vowel. Instead it is a juicy new product announcement. And in the adroit words of Clint, product announcements “actually matter” to Corey’s audience. Clint starts off with a light refresher of what exactly Cribl, an observability company on the fundamental level, does. Recently Cribl has made dovetailing observability and security a priority. With their announcement of their product Cribl Edge, for which they did buy a couple of vowels-–they are “taking our existing best-in-class management technology, and we’re turning it into an agent.” Tune into Clint’s conversation for an extensive look at Cribl Edge, Clint’s perspectives, and more!

Episode Show Notes & Transcript

About Clint
Clint is the CEO and a co-founder at Cribl, a company focused on making observability viable for any organization, giving customers visibility and control over their data while maximizing value from existing tools.

Prior to co-founding Cribl, Clint spent two decades leading product management and IT operations at technology and software companies, including Splunk and Cricket Communications. As a former practitioner, he has deep expertise in network issues, database administration, and security operations.

Links:
Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Today’s episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that’s built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you’re defining those as, which depends probably on where you work. It’s getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that’s exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn’t eat all the data you’ve gotten on the system, it’s exactly what you’ve been looking for. Check it out today at min.io/download, and see for yourself. That’s min.io/download, and be sure to tell them that I sent you.

Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They’ve also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That’s S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I have a repeat guest joining me on this promoted episode. Clint Sharp is the CEO and co-founder of Cribl. Clint, thanks for joining me.

Clint: Hey, Corey, nice to be back.

Corey: I was super excited when you gave me the premise for this recording because you said you had some news to talk about, and I was really excited that oh, great, they’re finally going to buy a vowel so that people look at their name and understand how to pronounce it. And no, that’s nowhere near forward-looking enough. It’s instead it’s some, I guess, I don’t know, some product announcement or something. But you know, hope springs eternal. What have you got for us today?

Clint: Well, one of the reasons I love talking to your audiences because product announcements actually matter to this audience. It’s super interesting, as you get into starting a company, you’re such, like, a product person, you’re like, “Oh, I have this new set of things that’s really going to make your life better.” And then you go out to, like, the general media, and you’re like, “Hey, I have this product.” And they’re like, “I don’t care. What product? Do you have a funding announcement? Do you have something big in the market that—you know, do you have a new executive? Do you”—it’s like, “No, but, like, these features, like these things, that we—the way we make our lives better for our customers. Isn’t that interesting?” “No.”

Corey: Real depressing once you—“Do you have a security breach to announce?” It’s, “No. God no. Why would I wind up being that excited about it?” “Well, I don’t know. I’d be that excited about it.” And yeah, the stuff that mainstream media wants to write about in the context of tech companies is exactly the sort of thing that tech companies absolutely do not want to be written about for. But fortunately, that is neither here nor there.

Clint: Yeah, they want the thing that gets the clicks.

Corey: Exactly. You built a product that absolutely resonates in its target market and outside of that market. It’s one of those, what is that thing, again? If you could give us a light refresher on what Cribl is and does, you’ll probably do a better job of it than I will. We hope.

Clint: We’d love to. Yeah, so we are an observability company, fundamentally. I think one of the interesting things to talk about when it comes to observability is that observability and security are merging. And so I like to say observability and include security people. If you’re a security person, and you don’t feel included by the word observability, sorry.

We also include you; you’re under our tent here. So, we sell to technology professionals, we help make their lives better. And we do that today through a flagship product called LogStream—which is part of this announcement, we’re actually renaming to Stream. In some ways, we’re dropping logs—and we are a pipeline company. So, we help you take all of your existing agents, all of your existing data that’s moving, and we help you process that data in the stream to control costs and to send it multiple places.

And it sounds kind of silly, but one of the biggest problems that we end up solving for a lot of our enterprises is, “Hey, I’ve got, like, this old Syslog feed coming off of my firewalls”—like, you remember those things, right? Palo Alto firewalls, ASA firewalls—“I actually get that thing to multiple places because, hey, I want to get that data into another security solution. I want to get that data into a data lake. How do I do that?” Well, in today’s world, that actually turns out is sort of a neglected set of features, like, the vendors who provide you logging solutions, being able to reshape that data, filter that data, control costs, wasn’t necessarily at the top of their priority list.

It wasn’t nefarious. It wasn’t like people are like, “Oh, I’m going to make sure that they can’t process this data before it comes into my solution.” It’s more just, like, “I’ll get around to it eventually.” And the eventually never actually comes. And so our streaming product helps people do that today.

And the big announcement that we’re making this week is that we’re extending that same processing technology down to the endpoint with a new product we’re calling Cribl Edge. And so we’re taking our existing best-in-class management technology, and we’re turning it into an agent. And that seems kind of interesting because… I think everybody sort of assumed that the agent is dead. Okay, well, we’ve been building agents for a decade or two decades. Isn’t everything exactly the same as it was before?

But we really saw kind of a dearth of innovation in that area in terms of being able to manage your agents, being able to understand what data is available to be collected, being able to auto-discover the data that needs to be able to be collected, turning those agents into interactive troubleshooting experiences so that we can, kind of, replicate the ability to zoom into a remote endpoint and replicate that Linux command line experience that we’re not supposed to be getting anymore because we’re not supposed to SSH into boxes anymore. Well, how do I replicate that? How do I see how much disk is on this given endpoint if I can’t SSH into that box? And so Cribl Edge is a rethink about making this rich, interactive experience on top of all of these agents that become this really massive distributed system that we can process data all the way out at where the data is being emitted.

And so that means that now we don’t nec—if you want to process that data in the stream, okay, great, but if you want to process that data at its origination point, we can actually provide you cheaper cost because now you’re using a lot of that capacity that’s sitting out there on your endpoints that isn’t really being used today anyway—the average utilization of a Kubernetes cluster is like 30%—

Corey: It’s that high. I’m sort of surprised.

Clint: Right? I know. So, Datadog puts out the survey every year, which I think is really interesting, and that’s a number that always surprised me is just that people are already paying for this capacity, right? It’s sitting there, it’s on their AWS bill already, and with that average utilization, a lot of the stuff that we’re doing in other clusters, or while we’re moving that data can actually just be done right there where the data is being emitted. And also, if we’re doing things like filtering, we can lower egress charges, there’s lots of really, really good goodness that we can do by pushing that processing further closer to its origination point.

Corey: You know, the timing of this episode is somewhat apt because as of the time that we’re recording this, I spent most of yesterday troubleshooting and fixing my home wireless network, which is a whole Ubiquity-managed thing. And the controller was one of their all-in-one box things that kept more or less power cycling for no apparent reason. How do I figure out why it’s doing that? Well, I’m used to, these days, doing everything in a cloud environment where you can instrument things pretty easily, where things start and where things stop is well understood. Finally, I just gave up and used a controller that’s sitting on an EC2 instance somewhere, and now great, now I can get useful telemetry out of it because now it’s stuff I know how to deal with.

It also, turns out that surprise, my EC2 instance is not magically restarting itself due to heat issues. What a concept. So, I have a newfound appreciation for the fact that oh, yeah, not everything lives in a cloud provider’s regions. Who knew? This is a revelation that I think is going to be somewhat surprising for folks who’ve been building startups and believe that anything that’s older than 18 months doesn’t exist.

But there’s a lot of data centers out there, there are a lot of agents living all kinds of different places. And workloads continue to surprise me even now, just looking at my own client base. It’s a very diverse world when we’re talking about whether things are on-prem or whether they’re in cloud environments.

Clint: Well, also, there’s a lot of agents on every endpoint period, just due to the fact that security guys want an agent, the observability guys want an agent, the logging people want an agent. And then suddenly, I’m, you know, I’m looking at every endpoint—cloud, on-prem, whatever—and there’s 8, 10 agents sitting there. And so I think a lot of the opportunity that we saw was, we can unify the data collection for metric type of data. So, we have some really cool defaults. [unintelligible 00:07:30] this is one of the things where I think people don’t focus much on, kind of, the end-user experience. Like, let’s have reasonable defaults.

Let’s have the thing turn on, and actually, most people’s needs are set without tweaking any knobs or buttons, and no diving into YAML files and looking at documentation and trying to figure out exactly the way I need to configure this thing. Let’s collect metric data, let’s collect log data, let’s do it all from one central place with one agent that can send that data to multiple places. And I can send it to Grafana Cloud, if I want to; I can send it to Logz.io, I can send it to Splunk, I can send it to Elasticsearch, I can send it to AWS’s new Elasticsearch-y the thing that we don’t know what they’re going to call it yet after the lawsuit. Any of those can be done right from the endpoint from, like, a rich graphical experience where I think that there’s a really a desire now for people to kind of jump into these configuration files where really a lot of these users, this is a part-time job, and so hey, if I need to go set up data collection, do I want to learn about this detailed YAML file configuration that I’m only going to do once or twice, or should I be able to do it in an easy, intuitive way, where I can just sit down in front of the product, get my job done and move on without having to go learn some sort of new configuration language?

Corey: Once upon a time, I saw an early circa 2012, 2013 talk from Jordan Sissel, who is the creator of Logstash, and he talked a lot about how challenging it was to wind up parsing all of the variety of log files out there. Even something is relatively straightforward—wink, wink, nudge, nudge—as timestamps was an absolute monstrosity. And a lot of people have been talking in recent years about OpenTelemetry being the lingua franca that everything speaks so that is the wave of the future, but I’ve got a level with you, looking around, it feels like these people are living in a very different reality than the one that I appear to have stumbled into because the conversations people are having about how great it is sound amazing, but nothing that I’m looking at—granted from a very particular point of view—seems to be embracing it or supporting it. Is that just because I’m hanging out in the wrong places, or is it still a great idea whose time has yet to come, or something else?

Clint: So, I think a couple things. One is every conversation I have about OpenTelemetry is always, “Will be.” It’s always in the future. And there’s certainly a lot of interest. We see this from customer after customer, they’re very interested in OpenTelemetry and what the OpenTelemetry strategy is, but as an example OpenTelemetry logging is not yet finalized specification; they believe that they’re still six months to a year out. It seems to be perpetually six months to a year out there.

They are finalized for metrics and they are finalized for tracing. Where we see OpenTelemetry tends to be with companies like Honeycomb, companies like Datadog with their tracing product, or Lightstep. So, for tracing, we see OpenTelemetry adoption. But tracing adoption is also not that high either, relative to just general metrics of logs.

Corey: Yeah, the tracing implementations that I’ve seen, for example, Epsagon did this super well, where it would take a look at your Lambdas Function built into an application, and ah, we’re going to go ahead and instrument this automatically using layers or extensions for you. And life was good because suddenly you got very detailed breakdowns of exactly how data was flowing in the course of a transaction through 15 Lambdas Function. Great. With everything else I’ve seen, it’s, “Oh, you have to instrument all these things by hand.” Let me shortcut that for you: That means no one’s going to do it. They never are.

It’s anytime you have to do that undifferentiated heavy lifting of making sure that you put the finicky code just so into your application’s logic, it’s a shorthand for it’s only going to happen when you have no other choice. And I think that trying to surface that burden to the developer, instead of building it into the platform so they don’t have to think about it is inherently the wrong move.

Clint: I think there’s a strong belief in Silicon Valley that—similar to, like, Hollywood—that the biggest export Silicon Valley is going to have is culture. And so that’s going to be this culture of, like, developer supporting their stuff in production. I’m telling you, I sell to banks and governments and telcos and I don’t see that culture prevailing. I see a application developed by Accenture that’s operated by Tata. That’s a lot of inertia to overcome and a lot of regulation to overcome as well, and so, like, we can say that, hey, separation of duties isn’t really a thing and developers should be able to support all their own stuff in production.

I don’t see that happening. It may happen. It’ll certainly happen more than zero. And tracing is predicated on the whole idea that the developer is scratching their own itch. Like that I am in production and troubleshooting this and so I need this high-fidelity trace-level information to understand what’s going on with this one user’s experience, but that doesn’t tend to be in the enterprise, how things are actually troubleshot.

And so I think that more than anything is the headwind that slowing down distributed tracing adoption. It’s because you’re putting the onus on solving the problem on a developer who never ends up using the distributed tracing solution to begin with because there’s another operations department over there that’s actually operating the thing on a day-to-day basis.

Corey: Having come from one of those operations departments myself, the way that I would always fix things was—you know, in the era that I was operating it made sense—you’d SSH into a box and kick the tires, poke around, see what’s going on, look at the logs locally, look at the behaviors, the way you’d expect it to these days, that is considered a screamingly bad anti-pattern and it’s something that companies try their damnedest to avoid doing at all. When did that change? And what is the replacement for that? Because every time I asked people for the sorts of data that I would get from that sort of exploration when they’re trying to track something down, I’m more or less met with blank stares.

Clint: Yeah. Well, I think that’s a huge hole and one of the things that we’re actually trying to do with our new product. And I think the… how do I replicate that Linux command line experience? So, for example, something as simple, like, we’d like to think that these nodes are all ephemeral, but there’s still a disk, whether it’s virtual or not; that thing sometimes fills up, so how do I even do the simple thing like df -kh and see how much disk is there if I don’t already have all the metrics collected that I needed, or I need to go dive deep into an application and understand what that application is doing or seeing, what files it’s opening, or what log files it’s writing even?

Let’s give some good examples. Like, how do I even know what files an application is running? Actually, all that information is all there; we can go discover that. And so some of the things that we’re doing with Edge is trying to make this rich, interactive experience where you can actually teleport into the end node and see all the processes that are running and get a view that looks like top and be able to see how much disk is there and how much disk is being consumed. And really kind of replicating that whole troubleshooting experience that we used to get from the Linux command line, but now instead, it’s a tightly controlled experience where you’re not actually getting an arbitrary shell, where I could do anything that could give me root level access, or exploit holes in various pieces of software, but really trying to replicate getting you that high fidelity information because you don’t need any of that information until you need it.

And I think that’s part of the problem that’s hard with shipping all this data to some centralized platform and getting every metric and every log and moving all that data is the data is worthless until it isn’t worthless anymore. And so why do we even move it? Why don’t we provide a better experience for getting at the data at the time that we need to be able to get at the data. Or the other thing that we get to change fundamentally is if we have the edge available to us, we have way more capacity. I can store a lot of information in a few kilobytes of RAM on every node, but if I bring thousands of nodes into one central place, now I need a massive amount of RAM and a massive amount of cardinality when really what I need is the ability to actually go interrogate what’s running out there.

Corey: The thing that frustrates me the most is the way that I go back and find my old debug statements, which is, you know, I print out whatever it is that the current status is and so I can figure out where something’s breaking.

Clint: [Got here 00:15:08].

Corey: Yeah. I do it within AWS Lambda functions, and that’s great. And I go back and I remove them later when I notice how expensive CloudWatch logs are getting because at 50 cents per gigabyte of ingest on those things, and you have that Lambda function firing off a fair bit, that starts to add up when you’ve been excessively wordy with your print statements. It sounds ridiculous, but okay, then you’re storing it somewhere. If I want to take that log data and have something else consume it, that’s nine cents a gigabyte to get it out of AWS and then you’re going to want to move it again from wherever it is over there—potentially to a third system, because why not?—and it seems like the entire purpose of this log data is to sit there and be moved around because every time it gets moved, it winds up somehow costing me yet more money. Why do we do this?

Clint: I mean, it’s a great question because one of the things that I think we decided 15 years ago was that the reason to move this data was because that data may go poof. So, it was on a, you know, back in my day, it was an HP DL360 1U rackmount server that I threw in there, and it had raid zero discs and so if that thing went dead, well, we didn’t care, we’d replace it with another one. But if we wanted to find out why it went dead, we wanted to make sure that the data had moved before the thing went dead. But now that DL360 is a VM.

Corey: Yeah, or a container that is going to be gone in 20 minutes. So yeah, you don’t want to store it locally on that container. But discs are also a fair bit more durable than they once were, as well. And S3 talks about its 11 nines of durability. That’s great and all but most of my application logs don’t need that. So, I’m still trying to figure out where we went wrong.

Clint: Well, I think it was right for the time. And I think now that we have durable storage at the edge where that blob storage has already replicated three times and we can reattach—if that box crashes, we can reattach new compute to that same block storage. Actually, AWS has some cool features now, you can actually attach multiple VMs to the same block store. So, we could actually even have logs being written by one VM, but processed by another VM. And so there are new primitives available to us in the cloud, which we should be going back and re-questioning all of the things that we did ten to 15 years ago and all the practices that we had because they may not be relevant anymore, but we just never stopped to ask why.

Corey: Yeah, multi-attach was rolled out with their IO2 volumes, which are spendy but great. And they do warn you that you need a file system that actively supports that and applications that are aware of it. But cool, they have specific use cases that they’re clearly imagining this for. But ten years ago, we were building things out, and, “Ooh, EBS, how do I wind up attaching that from multiple instances?” The answer was, “Ohh, don’t do that.”

And that shaped all of our perspectives on these things. Now suddenly, you can. Is that, “Ohh don’t do that,” gut visceral reaction still valid? People don’t tend to go back and re-examine the why behind certain best practices until long after those best practices are now actively harmful.

Clint: And that’s really what we’re trying to do is to say, hey, should we move log data anymore if it’s at a durable place at the edge? Should we move metric data at all? Like, hey, we have these big TSDBs that have huge cardinality challenges, but if I just had all that information sitting in RAM at the original endpoint, I can store a lot of information and barely even touch the free RAM that’s already sitting out there at that endpoint. So, how to get out that data? Like, how to make that a rich user experience so that we can query it?

We have to build some software to do this, but we can start to question from first principles, hey, things are different now. Maybe we can actually revisit a lot of these architectural assumptions, drive cost down, give more capability than we actually had before for fundamentally cheaper. And that’s kind of what Cribl does is we’re looking at software is to say, “Man, like, let’s question everything and let’s go back to first principles.” “Why do we want this information?” “Well, I need to troubleshoot stuff.” “Okay, well, if I need to troubleshoot stuff, well, how do I do that?” “Well, today we move it, but do we have to? Do we have to move that data?” “No, we could probably give you an experience where you can dive right into that endpoint and get really, really high fidelity data without having to pay to move that and store it forever.” Because also, like, telemetry information, it’s basically worthless after 24 hours, like, if I’m moving that and paying to store it, then now I’m paying for something I’m never going to read back.

Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.


Corey: And worse, you wind up figuring out, okay, I’m going to store all that data going back to 2012, and it’s petabytes upon petabytes. And great, how do I actually search for a thing? Well, I have to use some other expensive thing of compute that’s going to start diving through all of that because the way I set up my partitioning, it isn’t aligned with anything looking at, like, recency or based upon time period, so right every time I want to look at what happened 20 minutes ago, I’m looking at what happened 20 years ago. And that just gets incredibly expensive, not just to maintain but to query and the rest. Now, to be clear, yes, this is an anti-pattern. It isn’t how things should be set up. But how should they be set up? And it is the collective the answer to that right now actually what’s best, or is it still harkening back to old patterns that no longer apply?

Clint: Well, the future is here, it’s just unevenly distributed. So there’s, you know, I think an important point about us or how we think about building software is with this customer is first attitude and fundamentally bringing them choice. Because the reality is that doing things the old way may be the right decision for you. You may have compliance requirements to say—there’s a lot of financial services institutions, for example, like, they have to keep every byte of data written on any endpoint for seven years. And so we have to accommodate their requirements.

Like, is that the right requirement? Well, I don’t know. The regulator wrote it that way, so therefore, I have to do it. Whether it’s the right thing or the wrong thing for the business, I have no choice. And their decisions are just as right as the person who says this data is worthless and should all just be thrown away.

We really want to be able to go and say, like, hey, what decision is right? We’re going to give you the option to do it this way, we’re going to give you the option to do it this way. Now, the hard part—and that when it comes down to, like, marketing, it’s like you want to have this really simple message, like, “This is the one true path.” And a lot of vendors are this way, “There’s this new wonderful, right, true path that we are going to take you on, and follow along behind me.” But the reality is, enterprise worlds are gritty and ugly, and they’re full of old technology and new technology.

And they need to be able to support getting data off the mainframe the same way as they’re doing a brand new containerized microservices application. In fact, that brand new containerized microservices application is probably talking to the mainframe through some API. And so all of that has to work at once.

Corey: Oh, yeah. And it’s all of our payment data is in our PCI environment that PCI needs to have every byte logged. Great. Why is three-quarters of your infrastructure considered the PCI environment? Maybe you can constrain that at some point and suddenly save a whole bunch of effort, time, money, and regulatory drag on this.

But as you go through that journey, you need to not only have a tool that will work when you get there but a tool that will work where you are today. And a lot of companies miss that mark, too. It’s, “Oh, once you modernize and become the serverless success story of the decade, then our product is going to be right for you.” “Great. We’ll send you a postcard if we ever get there and then you can follow up with us.”

Alternately, it’s well, “Yeah, we’re this is how we are today, but we have a visions of a brighter tomorrow.” You’ve got to be able to meet people where they are at any point of that journey. One of the things I’ve always respected about Cribl has been the way that you very fluidly tell both sides of that story.

Clint: And it’s not their fault.

Corey: Yeah.

Clint: Most of the people who pick a job, they pick the job because, like—look, I live in Kansas City, Missouri, and there’s this data processing company that works primarily on mainframes, it’s right down the road. And they gave me a job and it pays me $150,000 a year, and I got a big house and things are great. And I’m a sysadmin sitting there. I don’t get to play with the new technology. Like, that customer is just as an applicable customer, we want to help them exactly the same as the new Silicon Valley hip kid who’s working at you know, a venture-backed startup, they’re doing everything natively in the cloud. Those are all right decisions, depending on where you happen to find yourself, and we want to support you with our products, no matter where you find yourself on the technology spectrum.

Corey: Speaking of old and new, and the trends of the industry, when you first set up this recording, you mentioned, “Oh, yeah, we should make it a point to maybe talk about the acquisition,” at which point I sprayed coffee across my iMac. Thanks for that. Turns out it wasn’t your acquisition we were talking about so much as it is the—at the time we record this—-the yet-to-close rumored acquisition of Splunk by Cisco.

Clint: I think it’s both interesting and positive for some people, and sad for others. I think Cisco is obviously a phenomenal company. They run the networking world. The fact that they’ve been moving into observability—they bought companies like AppDynamics, and we were talking about Epsagon before the show, they bought—ServiceNow, just bought Lightstep recently. There’s a lot of acquisitions in this space.

I think that when it comes to something like Splunk, Splunk is a fast-growing company by compared to Cisco. And so for them, this is something that they think that they can put into their distribution channel, and what Cisco knows how to do is to sell things like they’re very good at putting things through their existing sales force and really amplifying the sales of that particular thing that they have just acquired. That being said, I think for a company that was as innovative as Splunk, I do find it a bit sad with the idea that it’s going to become part of this much larger behemoth and not really probably driving the observability and security industry forward anymore because I don’t think anybody really looks at Cisco as a company that’s driving things—not to slam them or anything, but I don’t really see them as driving the industry forward.

Corey: Somewhere along the way, they got stuck and I don’t know how to reconcile that because they were a phenomenally fast-paced innovative company, briefly the most valuable company in the world during the dotcom bubble. And then they just sort of stalled out somewhere and, on some level, not to talk smack about it, but it feels like the level of innovation we’ve seen from Splunk has curtailed over the past half-decade or so. And selling to Cisco feels almost like a tacit admission that they are effectively out of ideas. And maybe that’s unfair.

Clint: I mean, we can look at the track record of what’s been shipped over the last five years from Splunk. And again they’re a partner, their customers are great, I think they still have the best log indexing engine on the market. That was their core product and what has made them the majority of their money. But there’s not been a lot new. And I think objectively we can look at that without throwing stones and say like, “Well, what net-new? You bought SignalFX. Like, good for you guys like that seems to be going well. You’ve launched your observability suite based off of these acquisitions.” But organic product-wise, there’s not a lot coming out of the factory.

Corey: I’ll take it a bit further-slash-sadder, we take a look at some great companies that were acquired—OpenDNS, Duo Security, SignalFX, as you mentioned, Epsagon, ThousandEyes—and once they’ve gotten acquired by Cisco, they all more or less seem to be frozen in time, like they’re trapped in amber, which leads us up to the natural dinosaur analogy that I’ll probably make in a less formal setting. It just feels like once a company is bought by Cisco, their velocity peters out, a lot of their staff leaves, and what you see is what you get. And I don’t know if that’s accurate, I’m just not looking in the right places, but every time I talk to folks in the industry about this, I get a lot of knowing nods that are tied to it. So, whether or not that’s true or not, that is very clearly, at least in some corners of the market, the active perception.

Clint: There’s a very real fact that if you look even at very large companies, innovation is driven from a core set of a handful of people. And when those people start to leave, the innovation really stops. It’s those people who think about things back from first principles—like why are we doing things? What different can we do?—and they’re the type of drivers that drive change.

So, Frank Slootman wrote a book recently called Amp it Up that I’ve been reading over the last weekend, and he talks—has this article that was on LinkedIn a while back called “Drivers vs. Passengers” and he’s always looking for drivers. And those drivers tend to not find themselves as happy in bigger companies and they tend to head for the exits. And so then you end up with the people who are a lot of the passenger type of people, the people who are like—they’ll carry it forward, they’ll continue to scale it, the business will continue to grow at whatever rate it’s going to grow, but you’re probably not going to see a lot of the net-new stuff. And I’ll put it in comparison to a company like Datadog who I have a vast amount of respect for I think they’re incredibly innovative company, and I think they continue to innovate.

Still driven by the founders, the people who created the original product are still there driving the vision, driving forward innovation. And that’s what tends to move the envelope is the people who have the moral authority inside of an even larger organization to say, “Get behind me. We’re going in this direction. We’re going to go take that hill. We’re going to go make things better for our customers.” And when you start to lose those handful of really critical contributors, that’s where you start to see the innovation dry up.

Corey: Where do you see the acquisitions coming from? Is it just at some point people shove money at these companies that got acquired that is beyond the wildest dreams of avarice? Is it that they believe that they’ll be able to execute better on their mission and they were independently? These are still smart, driven, people who have built something and I don’t know that they necessarily see an acquisition as, “Well, time to give up and coast for a while and then I’ll leave.” But maybe it is. I’ve never found myself in that situation, so I can’t speak for sure.

Clint: You kind of I think, have to look at the business and then whoever’s running the business at that time—and I sit in the CEO chair—so you have to look at the business and say, “What do we have inside the house here?” Like, “What more can we do?” If we think that there’s the next billion-dollar, multi-billion-dollar product sitting here, even just in our heads, but maybe in the factory and being worked on, then we should absolutely not sell because the value is still there and we’re going to grow the company much faster as an independent entity than we would you know, inside of a larger organization. But if you’re the board of directors and you’re looking around and saying like, hey look, like, I don’t see another billion-dollar line of bus—at this scale, right, if your Splunk scale, right? I don’t see another billion-dollar line of business sitting here, we could probably go acquire it, we could try to add it in, but you know, in the case of something like a Splunk, I think part of—you know, they’re looking for a new CEO right now, so now they have to go find a new leader who’s going to come in, re-energize and, kind of, reboot that.

But that’s the options that they’re considering, right? They’re like, “Do I find a new CEO who’s going to reinvigorate things and be able to attract the type of talent that’s going to lead us to the next billion-dollar line of business that we can either build inside or we can acquire and bring in-house? Or is the right path for me just to say, ‘Okay, well, you know, somebody like Cisco’s interested?’” or the other path that you may see them go down to something like Silver Lake, so Silver Lake put a billion dollars into the company last year. And so they may be looking at and say, “Okay, well, we really need to do some restructuring here and we want to do it outside the eyes of the public market. We want to be able to change pricing model, we want to be able to really do this without having to worry about the stock price’s massive volatility because we’re making big changes.”

And so I would say there’s probably two big options there considering. Like, do we sell to Cisco, do we sell to Silver Lake, or do we really take another run at this? And those are difficult decisions for the stewards of the business and I think it’s a different decision if you’re the steward of the business that created the business versus the steward of the business for whom this is—the I’ve been here for five years and I may be here for five years more. For somebody like me, a company like Cribl is literally the thing I plan to leave on this earth.

Corey: Yeah. Do you have that sense of personal attachment to it? On some level, The Duckbill Group, that’s exactly what I’m staring at where it’s great. Someone wants to buy the Last Week in AWS media side of the house.

Great. Okay. What is that really, beyond me? Because so much of it’s been shaped by my personality. There's an audience, sure, but it’s a skeptical audience, one that doesn’t generally tend to respond well to mass market, generic advertisements, so monetizing that is not going to go super well.

“All right, we’re going to start doing data mining on people.” Well, that’s explicitly against the terms of service people signed up for, so good luck with that. So, much starts becoming bizarre and strange when you start looking at building something with the idea of, oh, in three years, I’m going to unload this puppy and make it someone else’s problem. The argument is that by building something with an eye toward selling it, you build a better-structured business, but it also means you potentially make trade-offs that are best not made. I’m not sure there’s a right answer here.

Clint: In my spare time, I do some investments, angel investments, and that sort of thing, and that’s always a red flag for me when I meet a founder who’s like, “In three to five years, I plan to sell it to these people.” If you don’t have a vision for how you’re fundamentally going to alter the marketplace and our perception of everything else, you’re not dreaming big enough. And that to me doesn’t look like a great investment. It doesn’t look like the—how do you attract employees in that way? Like, “Okay, our goal is to work really hard for the next three years so that we will be attractive to this other bigger thing.” They may be thinking it on the inside as an available option, but if you think that’s your default option when starting a company, I don’t think you’re going to end up with the outcome is truly what you’re hoping for.

Corey: Oh, yeah. In my case, the only acquisition story I see is some large company buying us just largely to shut me up. But—

Clint: [laugh].

Corey: —that turns out to be kind of expensive, so all right. I also don’t think it serve any of them nearly as well as they think it would.

Clint: Well, you’ll just become somebody else on Twitter. [laugh].

Corey: Yeah, “Time to change my name again. Here we go.” So, if people want to go and learn more about a Cribl Edge, where can they do that?

Clint: Yeah, cribl.io. And then if you’re more of a technical person, and you’d like to understand the specifics, docs.cribl.io. That’s where I always go when I’m checking out a vendor; just skip past the main page and go straight to the docs. So, check that out.

And then also, if you’re wanting to play with the product, we make online available education called Sandboxes, at sandbox.cribl.io, where you can go spin up your own version of the product, walk through some interactive tutorials, and get a view on how it might work for you.

Corey: Such a great pattern, at least for the way that I think about these things. You can have flashy videos, you can have great screenshots, you can have documentation that is the finest thing on this earth, but let me play with it; let me kick the tires on it, even with a sample data set. Because until I can do that, I’m not really going to understand where the product starts and where it stops. That is the right answer from where I sit. Again, I understand that everyone’s different, not everyone thinks like I do—thankfully—but for me, that’s the best way I’ve ever learned something.

Clint: I love to get my hands on the product, and in fact, I’m always a little bit suspicious of any company when I go to their webpage and I can’t either sign up for the product or I can’t get to the documentation, and I have to talk to somebody in order to learn. That’s pretty much I’m immediately going to the next person in that market to go look for somebody who will let me.

Corey: [laugh]. Thank you again for taking so much time to speak with me. I appreciate it. As always, it’s a pleasure.

Clint: Thanks, Corey. Always enjoy talking to you.

Corey: Clint Sharp, CEO and co-founder of Cribl. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment. And when you hit submit, be sure to follow it up with exactly how many distinct and disparate logging systems that obnoxious comment had to pass through on your end of things.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.


Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Today’s episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that’s built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you’re defining those as, which depends probably on where you work. It’s getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that’s exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn’t eat all the data you’ve gotten on the system, it’s exactly what you’ve been looking for. Check it out today at min.io/download, and see for yourself. That’s min.io/download, and be sure to tell them that I sent you.

Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They’ve also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That’s S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I have a repeat guest joining me on this promoted episode. Clint Sharp is the CEO and co-founder of Cribl. Clint, thanks for joining me.

Clint: Hey, Corey, nice to be back.

Corey: I was super excited when you gave me the premise for this recording because you said you had some news to talk about, and I was really excited that oh, great, they’re finally going to buy a vowel so that people look at their name and understand how to pronounce it. And no, that’s nowhere near forward-looking enough. It’s instead it’s some, I guess, I don’t know, some product announcement or something. But you know, hope springs eternal. What have you got for us today?

Clint: Well, one of the reasons I love talking to your audiences because product announcements actually matter to this audience. It’s super interesting, as you get into starting a company, you’re such, like, a product person, you’re like, “Oh, I have this new set of things that’s really going to make your life better.” And then you go out to, like, the general media, and you’re like, “Hey, I have this product.” And they’re like, “I don’t care. What product? Do you have a funding announcement? Do you have something big in the market that—you know, do you have a new executive? Do you”—it’s like, “No, but, like, these features, like these things, that we—the way we make our lives better for our customers. Isn’t that interesting?” “No.”

Corey: Real depressing once you—“Do you have a security breach to announce?” It’s, “No. God no. Why would I wind up being that excited about it?” “Well, I don’t know. I’d be that excited about it.” And yeah, the stuff that mainstream media wants to write about in the context of tech companies is exactly the sort of thing that tech companies absolutely do not want to be written about for. But fortunately, that is neither here nor there.

Clint: Yeah, they want the thing that gets the clicks.

Corey: Exactly. You built a product that absolutely resonates in its target market and outside of that market. It’s one of those, what is that thing, again? If you could give us a light refresher on what Cribl is and does, you’ll probably do a better job of it than I will. We hope.

Clint: We’d love to. Yeah, so we are an observability company, fundamentally. I think one of the interesting things to talk about when it comes to observability is that observability and security are merging. And so I like to say observability and include security people. If you’re a security person, and you don’t feel included by the word observability, sorry.

We also include you; you’re under our tent here. So, we sell to technology professionals, we help make their lives better. And we do that today through a flagship product called LogStream—which is part of this announcement, we’re actually renaming to Stream. In some ways, we’re dropping logs—and we are a pipeline company. So, we help you take all of your existing agents, all of your existing data that’s moving, and we help you process that data in the stream to control costs and to send it multiple places.

And it sounds kind of silly, but one of the biggest problems that we end up solving for a lot of our enterprises is, “Hey, I’ve got, like, this old Syslog feed coming off of my firewalls”—like, you remember those things, right? Palo Alto firewalls, ASA firewalls—“I actually get that thing to multiple places because, hey, I want to get that data into another security solution. I want to get that data into a data lake. How do I do that?” Well, in today’s world, that actually turns out is sort of a neglected set of features, like, the vendors who provide you logging solutions, being able to reshape that data, filter that data, control costs, wasn’t necessarily at the top of their priority list.

It wasn’t nefarious. It wasn’t like people are like, “Oh, I’m going to make sure that they can’t process this data before it comes into my solution.” It’s more just, like, “I’ll get around to it eventually.” And the eventually never actually comes. And so our streaming product helps people do that today.

And the big announcement that we’re making this week is that we’re extending that same processing technology down to the endpoint with a new product we’re calling Cribl Edge. And so we’re taking our existing best-in-class management technology, and we’re turning it into an agent. And that seems kind of interesting because… I think everybody sort of assumed that the agent is dead. Okay, well, we’ve been building agents for a decade or two decades. Isn’t everything exactly the same as it was before?

But we really saw kind of a dearth of innovation in that area in terms of being able to manage your agents, being able to understand what data is available to be collected, being able to auto-discover the data that needs to be able to be collected, turning those agents into interactive troubleshooting experiences so that we can, kind of, replicate the ability to zoom into a remote endpoint and replicate that Linux command line experience that we’re not supposed to be getting anymore because we’re not supposed to SSH into boxes anymore. Well, how do I replicate that? How do I see how much disk is on this given endpoint if I can’t SSH into that box? And so Cribl Edge is a rethink about making this rich, interactive experience on top of all of these agents that become this really massive distributed system that we can process data all the way out at where the data is being emitted.

And so that means that now we don’t nec—if you want to process that data in the stream, okay, great, but if you want to process that data at its origination point, we can actually provide you cheaper cost because now you’re using a lot of that capacity that’s sitting out there on your endpoints that isn’t really being used today anyway—the average utilization of a Kubernetes cluster is like 30%—

Corey: It’s that high. I’m sort of surprised.

Clint: Right? I know. So, Datadog puts out the survey every year, which I think is really interesting, and that’s a number that always surprised me is just that people are already paying for this capacity, right? It’s sitting there, it’s on their AWS bill already, and with that average utilization, a lot of the stuff that we’re doing in other clusters, or while we’re moving that data can actually just be done right there where the data is being emitted. And also, if we’re doing things like filtering, we can lower egress charges, there’s lots of really, really good goodness that we can do by pushing that processing further closer to its origination point

Corey: You know, the timing of this episode is somewhat apt because as of the time that we’re recording this, I spent most of yesterday troubleshooting and fixing my home wireless network, which is a whole Ubiquity-managed thing. And the controller was one of their all-in-one box things that kept more or less power cycling for no apparent reason. How do I figure out why it’s doing that? Well, I’m used to, these days, doing everything in a cloud environment where you can instrument things pretty easily, where things start and where things stop is well understood. Finally, I just gave up and used a controller that’s sitting on an EC2 instance somewhere, and now great, now I can get useful telemetry out of it because now it’s stuff I know how to deal with.

It also, turns out that surprise, my EC2 instance is not magically restarting itself due to heat issues. What a concept. So, I have a newfound appreciation for the fact that oh, yeah, not everything lives in a cloud provider’s regions. Who knew? This is a revelation that I think is going to be somewhat surprising for folks who’ve been building startups and believe that anything that’s older than 18 months doesn’t exist.

But there’s a lot of data centers out there, there are a lot of agents living all kinds of different places. And workloads continue to surprise me even now, just looking at my own client base. It’s a very diverse world when we’re talking about whether things are on-prem or whether they’re in cloud environments.

Clint: Well, also, there’s a lot of agents on every endpoint period, just due to the fact that security guys want an agent, the observability guys want an agent, the logging people want an agent. And then suddenly, I’m, you know, I’m looking at every endpoint—cloud, on-prem, whatever—and there’s 8, 10 agents sitting there. And so I think a lot of the opportunity that we saw was, we can unify the data collection for metric type of data. So, we have some really cool defaults. [unintelligible 00:07:30] this is one of the things where I think people don’t focus much on, kind of, the end-user experience. Like, let’s have reasonable defaults.

Let’s have the thing turn on, and actually, most people’s needs are set without tweaking any knobs or buttons, and no diving into YAML files and looking at documentation and trying to figure out exactly the way I need to configure this thing. Let’s collect metric data, let’s collect log data, let’s do it all from one central place with one agent that can send that data to multiple places. And I can send it to Grafana Cloud, if I want to; I can send it to Logz.io, I can send it to Splunk, I can send it to Elasticsearch, I can send it to AWS’s new Elasticsearch-y the thing that we don’t know what they’re going to call it yet after the lawsuit. Any of those can be done right from the endpoint from, like, a rich graphical experience where I think that there’s a really a desire now for people to kind of jump into these configuration files where really a lot of these users, this is a part-time job, and so hey, if I need to go set up data collection, do I want to learn about this detailed YAML file configuration that I’m only going to do once or twice, or should I be able to do it in an easy, intuitive way, where I can just sit down in front of the product, get my job done and move on without having to go learn some sort of new configuration language?

Corey: Once upon a time, I saw an early circa 2012, 2013 talk from Jordan Sissel, who is the creator of Logstash, and he talked a lot about how challenging it was to wind up parsing all of the variety of log files out there. Even something is relatively straightforward—wink, wink, nudge, nudge—as timestamps was an absolute monstrosity. And a lot of people have been talking in recent years about OpenTelemetry being the lingua franca that everything speaks so that is the wave of the future, but I’ve got a level with you, looking around, it feels like these people are living in a very different reality than the one that I appear to have stumbled into because the conversations people are having about how great it is sound amazing, but nothing that I’m looking at—granted from a very particular point of view—seems to be embracing it or supporting it. Is that just because I’m hanging out in the wrong places, or is it still a great idea whose time has yet to come, or something else?

Clint: So, I think a couple things. One is every conversation I have about OpenTelemetry is always, “Will be.” It’s always in the future. And there’s certainly a lot of interest. We see this from customer after customer, they’re very interested in OpenTelemetry and what the OpenTelemetry strategy is, but as an example OpenTelemetry logging is not yet finalized specification; they believe that they’re still six months to a year out. It seems to be perpetually six months to a year out there.

They are finalized for metrics and they are finalized for tracing. Where we see OpenTelemetry tends to be with companies like Honeycomb, companies like Datadog with their tracing product, or Lightstep. So, for tracing, we see OpenTelemetry adoption. But tracing adoption is also not that high either, relative to just general metrics of logs.

Corey: Yeah, the tracing implementations that I’ve seen, for example, Epsagon did this super well, where it would take a look at your Lambdas Function built into an application, and ah, we’re going to go ahead and instrument this automatically using layers or extensions for you. And life was good because suddenly you got very detailed breakdowns of exactly how data was flowing in the course of a transaction through 15 Lambdas Function. Great. With everything else I’ve seen, it’s, “Oh, you have to instrument all these things by hand.” Let me shortcut that for you: That means no one’s going to do it. They never are.

It’s anytime you have to do that undifferentiated heavy lifting of making sure that you put the finicky code just so into your application’s logic, it’s a shorthand for it’s only going to happen when you have no other choice. And I think that trying to surface that burden to the developer, instead of building it into the platform so they don’t have to think about it is inherently the wrong move.

Clint: I think there’s a strong belief in Silicon Valley that—similar to, like, Hollywood—that the biggest export Silicon Valley is going to have is culture. And so that’s going to be this culture of, like, developer supporting their stuff in production. I’m telling you, I sell to banks and governments and telcos and I don’t see that culture prevailing. I see a application developed by Accenture that’s operated by Tata. That’s a lot of inertia to overcome and a lot of regulation to overcome as well, and so, like, we can say that, hey, separation of duties isn’t really a thing and developers should be able to support all their own stuff in production.

I don’t see that happening. It may happen. It’ll certainly happen more than zero. And tracing is predicated on the whole idea that the developer is scratching their own itch. Like that I am in production and troubleshooting this and so I need this high-fidelity trace-level information to understand what’s going on with this one user’s experience, but that doesn’t tend to be in the enterprise, how things are actually troubleshot.

And so I think that more than anything is the headwind that slowing down distributed tracing adoption. It’s because you’re putting the onus on solving the problem on a developer who never ends up using the distributed tracing solution to begin with because there’s another operations department over there that’s actually operating the thing on a day-to-day basis.

Corey: Having come from one of those operations departments myself, the way that I would always fix things was—you know, in the era that I was operating it made sense—you’d SSH into a box and kick the tires, poke around, see what’s going on, look at the logs locally, look at the behaviors, the way you’d expect it to these days, that is considered a screamingly bad anti-pattern and it’s something that companies try their damnedest to avoid doing at all. When did that change? And what is the replacement for that? Because every time I asked people for the sorts of data that I would get from that sort of exploration when they’re trying to track something down, I’m more or less met with blank stares.

Clint: Yeah. Well, I think that’s a huge hole and one of the things that we’re actually trying to do with our new product. And I think the… how do I replicate that Linux command line experience? So, for example, something as simple, like, we’d like to think that these nodes are all ephemeral, but there’s still a disk, whether it’s virtual or not; that thing sometimes fills up, so how do I even do the simple thing like df -kh and see how much disk is there if I don’t already have all the metrics collected that I needed, or I need to go dive deep into an application and understand what that application is doing or seeing, what files it’s opening, or what log files it’s writing even?

Let’s give some good examples. Like, how do I even know what files an application is running? Actually, all that information is all there; we can go discover that. And so some of the things that we’re doing with Edge is trying to make this rich, interactive experience where you can actually teleport into the end node and see all the processes that are running and get a view that looks like top and be able to see how much disk is there and how much disk is being consumed. And really kind of replicating that whole troubleshooting experience that we used to get from the Linux command line, but now instead, it’s a tightly controlled experience where you’re not actually getting an arbitrary shell, where I could do anything that could give me root level access, or exploit holes in various pieces of software, but really trying to replicate getting you that high fidelity information because you don’t need any of that information until you need it.

And I think that’s part of the problem that’s hard with shipping all this data to some centralized platform and getting every metric and every log and moving all that data is the data is worthless until it isn’t worthless anymore. And so why do we even move it? Why don’t we provide a better experience for getting at the data at the time that we need to be able to get at the data. Or the other thing that we get to change fundamentally is if we have the edge available to us, we have way more capacity. I can store a lot of information in a few kilobytes of RAM on every node, but if I bring thousands of nodes into one central place, now I need a massive amount of RAM and a massive amount of cardinality when really what I need is the ability to actually go interrogate what’s running out there.

Corey: The thing that frustrates me the most is the way that I go back and find my old debug statements, which is, you know, I print out whatever it is that the current status is and so I can figure out where something’s breaking.

Clint: [Got here 00:15:08].

Corey: Yeah. I do it within AWS Lambda functions, and that’s great. And I go back and I remove them later when I notice how expensive CloudWatch logs are getting because at 50 cents per gigabyte of ingest on those things, and you have that Lambda function firing off a fair bit, that starts to add up when you’ve been excessively wordy with your print statements. It sounds ridiculous, but okay, then you’re storing it somewhere. If I want to take that log data and have something else consume it, that’s nine cents a gigabyte to get it out of AWS and then you’re going to want to move it again from wherever it is over there—potentially to a third system, because why not?—and it seems like the entire purpose of this log data is to sit there and be moved around because every time it gets moved, it winds up somehow costing me yet more money. Why do we do this?

Clint: I mean, it’s a great question because one of the things that I think we decided 15 years ago was that the reason to move this data was because that data may go poof. So, it was on a, you know, back in my day, it was an HP DL360 1U rackmount server that I threw in there, and it had raid zero discs and so if that thing went dead, well, we didn’t care, we’d replace it with another one. But if we wanted to find out why it went dead, we wanted to make sure that the data had moved before the thing went dead. But now that DL360 is a VM.

Corey: Yeah, or a container that is going to be gone in 20 minutes. So yeah, you don’t want to store it locally on that container. But discs are also a fair bit more durable than they once were, as well. And S3 talks about its 11 nines of durability. That’s great and all but most of my application logs don’t need that. So, I’m still trying to figure out where we went wrong.

Clint: Well, I think it was right for the time. And I think now that we have durable storage at the edge where that blob storage has already replicated three times and we can reattach—if that box crashes, we can reattach new compute to that same block storage. Actually, AWS has some cool features now, you can actually attach multiple VMs to the same block store. So, we could actually even have logs being written by one VM, but processed by another VM. And so there are new primitives available to us in the cloud, which we should be going back and re-questioning all of the things that we did ten to 15 years ago and all the practices that we had because they may not be relevant anymore, but we just never stopped to ask why.

Corey: Yeah, multi-attach was rolled out with their IO2 volumes, which are spendy but great. And they do warn you that you need a file system that actively supports that and applications that are aware of it. But cool, they have specific use cases that they’re clearly imagining this for. But ten years ago, we were building things out, and, “Ooh, EBS, how do I wind up attaching that from multiple instances?” The answer was, “Ohh, don’t do that.”

And that shaped all of our perspectives on these things. Now suddenly, you can. Is that, “Ohh don’t do that,” gut visceral reaction still valid? People don’t tend to go back and re-examine the why behind certain best practices until long after those best practices are now actively harmful.

Clint: And that’s really what we’re trying to do is to say, hey, should we move log data anymore if it’s at a durable place at the edge? Should we move metric data at all? Like, hey, we have these big TSDBs that have huge cardinality challenges, but if I just had all that information sitting in RAM at the original endpoint, I can store a lot of information and barely even touch the free RAM that’s already sitting out there at that endpoint. So, how to get out that data? Like, how to make that a rich user experience so that we can query it?

We have to build some software to do this, but we can start to question from first principles, hey, things are different now. Maybe we can actually revisit a lot of these architectural assumptions, drive cost down, give more capability than we actually had before for fundamentally cheaper. And that’s kind of what Cribl does is we’re looking at software is to say, “Man, like, let’s question everything and let’s go back to first principles.” “Why do we want this information?” “Well, I need to troubleshoot stuff.” “Okay, well, if I need to troubleshoot stuff, well, how do I do that?” “Well, today we move it, but do we have to? Do we have to move that data?” “No, we could probably give you an experience where you can dive right into that endpoint and get really, really high fidelity data without having to pay to move that and store it forever.” Because also, like, telemetry information, it’s basically worthless after 24 hours, like, if I’m moving that and paying to store it, then now I’m paying for something I’m never going to read back.

Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.

Corey: And worse, you wind up figuring out, okay, I’m going to store all that data going back to 2012, and it’s petabytes upon petabytes. And great, how do I actually search for a thing? Well, I have to use some other expensive thing of compute that’s going to start diving through all of that because the way I set up my partitioning, it isn’t aligned with anything looking at, like, recency or based upon time period, so right every time I want to look at what happened 20 minutes ago, I’m looking at what happened 20 years ago. And that just gets incredibly expensive, not just to maintain but to query and the rest. Now, to be clear, yes, this is an anti-pattern. It isn’t how things should be set up. But how should they be set up? And it is the collective the answer to that right now actually what’s best, or is it still harkening back to old patterns that no longer apply?

Clint: Well, the future is here, it’s just unevenly distributed. So there’s, you know, I think an important point about us or how we think about building software is with this customer is first attitude and fundamentally bringing them choice. Because the reality is that doing things the old way may be the right decision for you. You may have compliance requirements to say—there’s a lot of financial services institutions, for example, like, they have to keep every byte of data written on any endpoint for seven years. And so we have to accommodate their requirements.

Like, is that the right requirement? Well, I don’t know. The regulator wrote it that way, so therefore, I have to do it. Whether it’s the right thing or the wrong thing for the business, I have no choice. And their decisions are just as right as the person who says this data is worthless and should all just be thrown away.

We really want to be able to go and say, like, hey, what decision is right? We’re going to give you the option to do it this way, we’re going to give you the option to do it this way. Now, the hard part—and that when it comes down to, like, marketing, it’s like you want to have this really simple message, like, “This is the one true path.” And a lot of vendors are this way, “There’s this new wonderful, right, true path that we are going to take you on, and follow along behind me.” But the reality is, enterprise worlds are gritty and ugly, and they’re full of old technology and new technology.

And they need to be able to support getting data off the mainframe the same way as they’re doing a brand new containerized microservices application. In fact, that brand new containerized microservices application is probably talking to the mainframe through some API. And so all of that has to work at once.

Corey: Oh, yeah. And it’s all of our payment data is in our PCI environment that PCI needs to have every byte logged. Great. Why is three-quarters of your infrastructure considered the PCI environment? Maybe you can constrain that at some point and suddenly save a whole bunch of effort, time, money, and regulatory drag on this.

But as you go through that journey, you need to not only have a tool that will work when you get there but a tool that will work where you are today. And a lot of companies miss that mark, too. It’s, “Oh, once you modernize and become the serverless success story of the decade, then our product is going to be right for you.” “Great. We’ll send you a postcard if we ever get there and then you can follow up with us.”

Alternately, it’s well, “Yeah, we’re this is how we are today, but we have a visions of a brighter tomorrow.” You’ve got to be able to meet people where they are at any point of that journey. One of the things I’ve always respected about Cribl has been the way that you very fluidly tell both sides of that story.

Clint: And it’s not their fault.

Corey: Yeah.

Clint: Most of the people who pick a job, they pick the job because, like—look, I live in Kansas City, Missouri, and there’s this data processing company that works primarily on mainframes, it’s right down the road. And they gave me a job and it pays me $150,000 a year, and I got a big house and things are great. And I’m a sysadmin sitting there. I don’t get to play with the new technology. Like, that customer is just as an applicable customer, we want to help them exactly the same as the new Silicon Valley hip kid who’s working at you know, a venture-backed startup, they’re doing everything natively in the cloud. Those are all right decisions, depending on where you happen to find yourself, and we want to support you with our products, no matter where you find yourself on the technology spectrum.

Corey: Speaking of old and new, and the trends of the industry, when you first set up this recording, you mentioned, “Oh, yeah, we should make it a point to maybe talk about the acquisition,” at which point I sprayed coffee across my iMac. Thanks for that. Turns out it wasn’t your acquisition we were talking about so much as it is the—at the time we record this—-the yet-to-close rumored acquisition of Splunk by Cisco.

Clint: I think it’s both interesting and positive for some people, and sad for others. I think Cisco is obviously a phenomenal company. They run the networking world. The fact that they’ve been moving into observability—they bought companies like AppDynamics, and we were talking about Epsagon before the show, they bought—ServiceNow, just bought Lightstep recently. There’s a lot of acquisitions in this space.

I think that when it comes to something like Splunk, Splunk is a fast-growing company by compared to Cisco. And so for them, this is something that they think that they can put into their distribution channel, and what Cisco knows how to do is to sell things like they’re very good at putting things through their existing sales force and really amplifying the sales of that particular thing that they have just acquired. That being said, I think for a company that was as innovative as Splunk, I do find it a bit sad with the idea that it’s going to become part of this much larger behemoth and not really probably driving the observability and security industry forward anymore because I don’t think anybody really looks at Cisco as a company that’s driving things—not to slam them or anything, but I don’t really see them as driving the industry forward.

Corey: Somewhere along the way, they got stuck and I don’t know how to reconcile that because they were a phenomenally fast-paced innovative company, briefly the most valuable company in the world during the dotcom bubble. And then they just sort of stalled out somewhere and, on some level, not to talk smack about it, but it feels like the level of innovation we’ve seen from Splunk has curtailed over the past half-decade or so. And selling to Cisco feels almost like a tacit admission that they are effectively out of ideas. And maybe that’s unfair.

Clint: I mean, we can look at the track record of what’s been shipped over the last five years from Splunk. And again they’re a partner, their customers are great, I think they still have the best log indexing engine on the market. That was their core product and what has made them the majority of their money. But there’s not been a lot new. And I think objectively we can look at that without throwing stones and say like, “Well, what net-new? You bought SignalFX. Like, good for you guys like that seems to be going well. You’ve launched your observability suite based off of these acquisitions.” But organic product-wise, there’s not a lot coming out of the factory.

Corey: I’ll take it a bit further-slash-sadder, we take a look at some great companies that were acquired—OpenDNS, Duo Security, SignalFX, as you mentioned, Epsagon, ThousandEyes—and once they’ve gotten acquired by Cisco, they all more or less seem to be frozen in time, like they’re trapped in amber, which leads us up to the natural dinosaur analogy that I’ll probably make in a less formal setting. It just feels like once a company is bought by Cisco, their velocity peters out, a lot of their staff leaves, and what you see is what you get. And I don’t know if that’s accurate, I’m just not looking in the right places, but every time I talk to folks in the industry about this, I get a lot of knowing nods that are tied to it. So, whether or not that’s true or not, that is very clearly, at least in some corners of the market, the active perception.

Clint: There’s a very real fact that if you look even at very large companies, innovation is driven from a core set of a handful of people. And when those people start to leave, the innovation really stops. It’s those people who think about things back from first principles—like why are we doing things? What different can we do?—and they’re the type of drivers that drive change.

So, Frank Slootman wrote a book recently called Amp it Up that I’ve been reading over the last weekend, and he talks—has this article that was on LinkedIn a while back called “Drivers vs. Passengers” and he’s always looking for drivers. And those drivers tend to not find themselves as happy in bigger companies and they tend to head for the exits. And so then you end up with the people who are a lot of the passenger type of people, the people who are like—they’ll carry it forward, they’ll continue to scale it, the business will continue to grow at whatever rate it’s going to grow, but you’re probably not going to see a lot of the net-new stuff. And I’ll put it in comparison to a company like Datadog who I have a vast amount of respect for I think they’re incredibly innovative company, and I think they continue to innovate.

Still driven by the founders, the people who created the original product are still there driving the vision, driving forward innovation. And that’s what tends to move the envelope is the people who have the moral authority inside of an even larger organization to say, “Get behind me. We’re going in this direction. We’re going to go take that hill. We’re going to go make things better for our customers.” And when you start to lose those handful of really critical contributors, that’s where you start to see the innovation dry up.

Corey: Where do you see the acquisitions coming from? Is it just at some point people shove money at these companies that got acquired that is beyond the wildest dreams of avarice? Is it that they believe that they’ll be able to execute better on their mission and they were independently? These are still smart, driven, people who have built something and I don’t know that they necessarily see an acquisition as, “Well, time to give up and coast for a while and then I’ll leave.” But maybe it is. I’ve never found myself in that situation, so I can’t speak for sure.

Clint: You kind of I think, have to look at the business and then whoever’s running the business at that time—and I sit in the CEO chair—so you have to look at the business and say, “What do we have inside the house here?” Like, “What more can we do?” If we think that there’s the next billion-dollar, multi-billion-dollar product sitting here, even just in our heads, but maybe in the factory and being worked on, then we should absolutely not sell because the value is still there and we’re going to grow the company much faster as an independent entity than we would you know, inside of a larger organization. But if you’re the board of directors and you’re looking around and saying like, hey look, like, I don’t see another billion-dollar line of bus—at this scale, right, if your Splunk scale, right? I don’t see another billion-dollar line of business sitting here, we could probably go acquire it, we could try to add it in, but you know, in the case of something like a Splunk, I think part of—you know, they’re looking for a new CEO right now, so now they have to go find a new leader who’s going to come in, re-energize and, kind of, reboot that.

But that’s the options that they’re considering, right? They’re like, “Do I find a new CEO who’s going to reinvigorate things and be able to attract the type of talent that’s going to lead us to the next billion-dollar line of business that we can either build inside or we can acquire and bring in-house? Or is the right path for me just to say, ‘Okay, well, you know, somebody like Cisco’s interested?’” or the other path that you may see them go down to something like Silver Lake, so Silver Lake put a billion dollars into the company last year. And so they may be looking at and say, “Okay, well, we really need to do some restructuring here and we want to do it outside the eyes of the public market. We want to be able to change pricing model, we want to be able to really do this without having to worry about the stock price’s massive volatility because we’re making big changes.”

And so I would say there’s probably two big options there considering. Like, do we sell to Cisco, do we sell to Silver Lake, or do we really take another run at this? And those are difficult decisions for the stewards of the business and I think it’s a different decision if you’re the steward of the business that created the business versus the steward of the business for whom this is—the I’ve been here for five years and I may be here for five years more. For somebody like me, a company like Cribl is literally the thing I plan to leave on this earth.

Corey: Yeah. Do you have that sense of personal attachment to it? On some level, The Duckbill Group, that’s exactly what I’m staring at where it’s great. Someone wants to buy the Last Week in AWS media side of the house.

Great. Okay. What is that really, beyond me? Because so much of it’s been shaped by my personality. There's an audience, sure, but it’s a skeptical audience, one that doesn’t generally tend to respond well to mass market, generic advertisements, so monetizing that is not going to go super well.

“All right, we’re going to start doing data mining on people.” Well, that’s explicitly against the terms of service people signed up for, so good luck with that. So, much starts becoming bizarre and strange when you start looking at building something with the idea of, oh, in three years, I’m going to unload this puppy and make it someone else’s problem. The argument is that by building something with an eye toward selling it, you build a better-structured business, but it also means you potentially make trade-offs that are best not made. I’m not sure there’s a right answer here.

Clint: In my spare time, I do some investments, angel investments, and that sort of thing, and that’s always a red flag for me when I meet a founder who’s like, “In three to five years, I plan to sell it to these people.” If you don’t have a vision for how you’re fundamentally going to alter the marketplace and our perception of everything else, you’re not dreaming big enough. And that to me doesn’t look like a great investment. It doesn’t look like the—how do you attract employees in that way? Like, “Okay, our goal is to work really hard for the next three years so that we will be attractive to this other bigger thing.” They may be thinking it on the inside as an available option, but if you think that’s your default option when starting a company, I don’t think you’re going to end up with the outcome is truly what you’re hoping for.

Corey: Oh, yeah. In my case, the only acquisition story I see is some large company buying us just largely to shut me up. But—

Clint: [laugh].

Corey: —that turns out to be kind of expensive, so all right. I also don’t think it serve any of them nearly as well as they think it would.

Clint: Well, you’ll just become somebody else on Twitter. [laugh].

Corey: Yeah, “Time to change my name again. Here we go.” So, if people want to go and learn more about a Cribl Edge, where can they do that?

Clint: Yeah, cribl.io. And then if you’re more of a technical person, and you’d like to understand the specifics, docs.cribl.io. That’s where I always go when I’m checking out a vendor; just skip past the main page and go straight to the docs. So, check that out.

And then also, if you’re wanting to play with the product, we make online available education called Sandboxes, at sandbox.cribl.io, where you can go spin up your own version of the product, walk through some interactive tutorials, and get a view on how it might work for you.

Corey: Such a great pattern, at least for the way that I think about these things. You can have flashy videos, you can have great screenshots, you can have documentation that is the finest thing on this earth, but let me play with it; let me kick the tires on it, even with a sample data set. Because until I can do that, I’m not really going to understand where the product starts and where it stops. That is the right answer from where I sit. Again, I understand that everyone’s different, not everyone thinks like I do—thankfully—but for me, that’s the best way I’ve ever learned something.

Clint: I love to get my hands on the product, and in fact, I’m always a little bit suspicious of any company when I go to their webpage and I can’t either sign up for the product or I can’t get to the documentation, and I have to talk to somebody in order to learn. That’s pretty much I’m immediately going to the next person in that market to go look for somebody who will let me.

Corey: [laugh]. Thank you again for taking so much time to speak with me. I appreciate it. As always, it’s a pleasure.

Clint: Thanks, Corey. Always enjoy talking to you.

Corey: Clint Sharp, CEO and co-founder of Cribl. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment. And when you hit submit, be sure to follow it up with exactly how many distinct and disparate logging systems that obnoxious comment had to pass through on your end of things.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.