Fixing What’s Broken in Monitoring and Observability with Jean Yang

Episode Summary

Jean Yang, CEO of Akita Software, joins Corey on Screaming in the Cloud to discuss how she went from academia to tech founder, and what her company is doing to improve monitoring and observability. Jean explains why Akita is different from other observability & monitoring solutions, and how it bridges the gap from what people know they should be doing and what they actually do in practice. Corey and Jean explore why the monitoring and observability space has been so broken, and why it’s important for people to see monitoring as a chore and not a hobby. Jean also reveals how she took a leap from being an academic professor to founding a tech start-up.

Episode Show Notes & Transcript

About Jean

Jean Yang is the founder and CEO of Akita Software, providing the fastest time-to-value for API monitoring. Jean was previously a tenure-track professor in Computer Science at Carnegie Mellon University.

Links Referenced:

Akita Software: https://www.akitasoftware.com/
Aki the dog chatbot: https://www.akitasoftware.com/blog-posts/we-built-an-exceedingly-polite-ai-dog-that-answers-questions-about-your-apis
Twitter: https://twitter.com/jeanqasaur

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. My guest today is someone whose company has… well, let’s just say that it has piqued my interest. Jean Yang is the CEO of Akita Software and not only is it named after a breed of dog, which frankly, Amazon service namers could take a lot of lessons from, but it also tends to approach observability slash monitoring from a perspective of solving the problem rather than preaching a new orthodoxy. Jean, thank you for joining me.

Jean: Thank you for having me. Very excited.

Corey: In the world that we tend to operate in, there are so many different observability tools, and as best I can determine observability is hipster monitoring. Well, if we call it monitoring, we can’t charge you quite as much money for it. And whenever you go into any environment of significant scale, we pretty quickly discover that, “What monitoring tool are you using?” The answer is, “Here are the 15 that we use.” Then you talk to other monitoring and observability companies and ask them which ones of those they’ve replace, and the answer becomes, “We’re number 16.” Which is less compelling of a pitch than you might expect. What does Akita do? Where do you folks start and stop?

Jean: We want to be—at Akita—your first stop for monitoring and we want to be all of the monitoring, you need up to a certain level. And here’s the motivation. So, we’ve talked with hundreds, if not thousands, of software teams over the last few years and what we found is there is such a gap between best practice, what people think everybody else is doing, what people are talking about at conferences, and what’s actually happening in software teams. And so, what software teams have told me over and over again, is, hey, we either don’t actually use very many tools at all, or we use 15 tools in name, but it’s you know, one [laugh] one person on the team set this one up, it’s monitoring one of our endpoints, we don’t even know which one sometimes. Who knows what the thresholds are really supposed to be. We got too many alerts one day, we turned it off.

But there’s very much a gap between what people are saying they’re supposed to do, what people in their heads say they’re going to do next quarter or the quarter after that and what’s really happening in practice. And what we saw was teams are falling more and more into monitoring debt. And so effectively, their customers are becoming their monitoring and it’s getting harder to catch up. And so, what Akita does is we’re the fastest, easiest way for teams to quickly see what endpoints you have in your system—so that’s API endpoints—what’s slow and what’s throwing errors. And you might wonder, okay, wait, wait, wait, Jean. Monitoring is usually about, like, logs, metrics, and traces. I’m not used to hearing about API—like, what do APIs have to do with any of it?

And my view is, look, we want the most simple form of what might be wrong with your system, we want a developer to be able to get started without having to change any code, make any annotations, drop in any libraries. APIs are something you can watch from the outside of a system. And when it comes to which alerts actually matter, where do you want errors to be alerts, where do you want thresholds to really matter, my view is, look, the places where your system interfaces with another system are probably where you want to start if you’ve really gotten nothing. And so, Akita view is, we’re going to start from the outside in on this monitoring. We’re turning a lot of the views on monitoring and observability on its head and we just want to be the tool that you reach for if you’ve got nothing, it’s middle of the night, you have alerts on some endpoint, and you don’t want to spend a few hours or weeks setting up some other tool. And we also want to be able to grow with you up until you need that power tool that many of the existing solutions out there are today.

Corey: It feels like monitoring is very often one of those reactive things. I come from the infrastructure world, so you start off with, “What do you use for monitoring?” “Oh, we wait till the help desk calls us and users are reporting a problem.” Okay, that gets you somewhere. And then it becomes oh, well, what was wrong that time? The drive filled up. Okay, so we’re going to build checks in that tell us when the drives are filling up.

And you wind up trying to enumerate all of the different badness. And as a result, if you leave that to its logical conclusion, one of the stories that I heard out of MySpace once upon a time—which dates me somewhat—is that you would have a shift, so there were three shifts working around the clock, and each one would open about 5000 tickets, give or take, for the monitoring alerts that wound up firing off throughout their infrastructure. At that point, it’s almost, why bother? Because no one is going to be around to triage these things; no one is going to see any of the signal buried and all of that noise. When you talk about doing this for an API perspective, are you running synthetics against those APIs? Are you shimming them in order to see what’s passing through them? What’s the implementation side look like?

Jean: Yeah, that’s a great question. So, we’re using a technology called BPF, Berkeley Packet Filter. The more trendy, buzzy term is EBPF—

Corey: The EBPF. Oh yes.

Jean: Yeah, Extended Berkeley Packet Filter. But here’s the secret, we only use the BPF part. It’s actually a little easier for users to install. The E part is, you know, fancy and often finicky. But um—

Corey: SEBPF then: Shortened Extended BPF. Why not?

Jean: [laugh]. Yeah. And what BPF allows us to do is passively watch traffic from the outside of a system. So, think of it as you’re sending API calls across the network. We’re just watching that network. We’re not in the path of that traffic. So, we’re not intercepting the traffic in any way, we’re not creating any additional overhead for the traffic, we’re not slowing it down in any way. We’re just sitting on the side, we’re watching all of it, and then we’re taking that and shipping an obfuscated version off to our cloud, and then we’re giving you analytics on that.

Corey: One of the things that strikes me as being… I guess, a common trope is there are a bunch of observability solutions out there that offer this sort of insight into what’s going on within an environment, but it’s, “Step one: instrument with some SDK or some agent across everything. Do an entire deploy across your fleet.” Which yeah, people are not generally going to be in a hurry to sign up for. And further, you also said a minute ago that the idea being that someone could start using this in the middle of the night in the middle of an outage, which tells me that it’s not, “Step one: get the infrastructure sparkling. Step two: do a global deploy to everything.” How do you go about doing that? What is the level of embeddedness into the environment?

Jean: Yeah, that’s a great question. So, the reason we chose BPF is I wanted a completely black-box solution. So, no SDKs, no code annotations. I wanted people to be able to change a config file and have our solution apply to anything that’s on the system. So, you could add routes, you could do all kinds of things. I wanted there to be no additional work on the part of the developer when that happened.

And so, we’re not the only solution that uses BPF or EBPF. There’s many other solutions that say, “Hey, just drop us in. We’ll let you do anything you want.” The big difference is what happens with the traffic once it gets processed. So, what EBPF or BPF gives you is it watches everything about your system. And so, you can imagine that’s a lot of different events. That’s a lot of things.

If you’re trying to fix an incident in the middle of the night and someone just dumps on you 1000 pages of logs, like, what are you going to do with that? And so, our view is, the more interesting and important and valuable thing to do here is not make it so that you just have the ability to watch everything about your system but to make it so that developers don’t have to sift through thousands of events just to figure out what went wrong. So, we’ve spent years building algorithms to automatically analyze these API events to figure out, first of all, what are your endpoints? Because it’s one thing to turn on something like Wireshark and just say, okay, here are the thousand API calls, I saw—ten thousand—but it’s another thing to say, “Hey, 500 of those were actually the same endpoint and 300 of those had errors.” That’s quite a hard problem.

And before us, it turns out that there was no other solution that even did that to the level of being able to compile together, “Here are all the slow calls to an endpoint,” or, “Here are all of the erroneous calls to an endpoint.” That was blood, sweat, and tears of developers in the night before. And so, that’s the first major thing we do. And then metrics on top of that. So, today we have what’s slow, what’s throwing errors. People have asked us for other things like show me what happened after I deployed. Show me what’s going on this week versus last week. But now that we have this data set, you can imagine there’s all kinds of questions we can now start answering much more quickly on top of it.

Corey: One thing that strikes me about your site is that when I go to akitasoftware.com, you’ve got a shout-out section at the top. And because I’ve been doing this long enough where I find that, yeah, you work at a company; you’re going to say all kinds of wonderful, amazing aspirational things about it, and basically because I have deep-seated personality disorders, I will make fun of those things as my default reflexive reaction. But something that AWS, for example, does very well is when they announce something ridiculous on stage at re:Invent, I make fun of it, as is normal, but then they have a customer come up and say, “And here’s the expensive, painful problem that they solved for us.”

And that’s where I shut up and start listening. Because it’s a very different story to get someone else, who is presumably not being paid, to get on stage and say, “Yeah, this solved a sophisticated, painful problem.” Your shout-outs page has not just a laundry list of people saying great things about it, but there are former folks who have been on the show here, people I know and trust: Scott Johnson over at Docker, Gergely Orosz over at The Pragmatic Engineer, and other folks who have been luminaries in the space for a while. These are not the sort of people that are going to say, “Oh, sure. Why not? Oh, you’re going to send me a $50 gift card in a Twitter DM? Sure I’ll say nice things,” like it’s one of those respond to a viral tweet spamming something nonsense. These are people who have gravitas. It’s clear that there’s something you’re building that is resonating.

Jean: Yeah. And for that, they found us. Everyone that I’ve tried to bribe to say good things about us actually [laugh] refused.

Corey: Oh, yeah. As it turns out that it’s one of those things where people are more expensive than you might think. It’s like, “What, you want me to sell my credibility down the road?” Doesn’t work super well. But there’s something like the unsolicited testimonials that come out of, this is amazing, once people start kicking the tires on it.

You’re currently in open beta. So, I guess my big question for you is, whenever you see a product that says, “Oh, yeah, we solve everything cloud, on-prem, on physical instances, on virtual machines, on Docker, on serverless, everything across the board. It’s awesome.” I have some skepticism on that. What is your ideal application architecture that Akita works best on? And what sort of things are you a complete nonstarter for?

Jean: Yeah, I’ll start with a couple of things we work well on. So, container platforms. We work relatively well. So, that’s your Fargate, that’s your Azure Web Apps. But that, you know, things running, we call them container platforms. Kubernetes is also something that a lot of our users have picked us up and had success with us on. I will say our Kubernetes deploy is not as smooth as we would like. We say, you know, you can install us—

Corey: Well, that is Kubernetes, yes.

Jean: [laugh]. Yeah.

Corey: Nothing in Kubernetes is as smooth as we would like.

Jean: Yeah, so we’re actually rolling out Kubernetes injection support in the next couple of weeks. So, those are the two that people have had the most success on. If you’re running on bare metal or on a VM, we work, but I will say that you have to know your way around a little bit to get that to work. What we don’t work on is any Platform as a Service. So, like, a Heroku, a Lambda, a Render at the moment. So those, we haven’t found a way to passively listen to the network traffic in a good way right now.

And we also work best for unencrypted HTTP REST traffic. So, if you have encrypted traffic, it’s not a non-starter, but you need to fall into a couple of categories. You either need to be using Kubernetes, you can run Akita as a sidecar, or you’re using Nginx. And so, that’s something we’re still expanding support on. And we do not support GraphQL or GRPC at the moment.

Corey: That’s okay. Neither do I. It does seem these days that unencrypted HTTP API calls are increasingly becoming something of a relic, where folks are treating those as anti-patterns to be stamped out ruthlessly. Are you still seeing significant deployments of unencrypted APIs?

Jean: Yeah. [laugh]. So, Corey—

Corey: That is the reality, yes.

Jean: That’s a really good question, Corey, because in the beginning, we weren’t sure what we wanted to focus on. And I’m not saying the whole deployment is unencrypted HTTP, but there is a place to install Akita to watch where it’s unencrypted HTTP. And so, this is what I mean by if you have encrypted traffic, but you can install Akita as a Kubernetes sidecar, we can still watch that. But there was a big question when we started: should this be GraphQL, GRPC, or should it be REST? And I read the “State of the API Report” from Postman for you know, five years, and I still keep up with it.

And every year, it seemed that not only was REST, remaining dominant, it was actually growing. So, [laugh] this was shocking to me as well because people said, well, “We have this more structured stuff, now. There’s GRPC, there’s GraphQL.” But it seems that for the added complexity, people weren’t necessarily seeing the value and so, REST continues to dominate. And I’ve actually even seen a decline in GraphQL since we first started doing this. So, I’m fully on board the REST wagon. And in terms of encrypted versus unencrypted, I would also like to see more encryption as well. That’s why we’re working on burning down the long tail of support for that.

Corey: Yeah, it’s one of those challenges. Whenever you’re deploying something relatively new, there’s this idea that it should be forward-looking and you, on some level, want to modernize your architecture and infrastructure to keep up with it. An AWS integration story I see that’s like that these days is, “Oh, yeah, generate an IAM credential set and just upload those into our system.” Yeah, the modern way of doing that is role assumption: to find a role and here’s how to configure it so that it can do what we need to do. So, whenever you start seeing things that are, “Oh, yeah, just turn the security clock back in time a little bit,” that’s always a little bit of an eyebrow raise.

I can also definitely empathize with the joys of dealing with anything that even touches networking in a Lambda context. Building the Lambda extension for Tailscale was one of the last big dives I made into that area and I still have nightmares as a result. It does a lot of interesting things right up until you step off the golden path. And then suddenly, everything becomes yaks all the way down, in desperate need of shaving.

Jean: Yeah, Lambda does something we want to handle on our roadmap, but I… believe we need a bigger team before [laugh] we are ready to tackle that.

Corey: Yeah, we’re going to need a bigger boat is very often [laugh] the story people have when they start looking at entire new architectural paradigms. So, you end up talking about working in containerized environments. Do you find that most of your deployments are living in cloud environments, in private data centers, some people call them private cloud. Where does the bulk of your user applications tend to live these days?

Jean: The bulk of our user applications are in the cloud. So, we’re targeting small to medium businesses to start. The reason being, we want to give our users a magical deployment experience. So, right now, a lot of our users are deploying in under 30 minutes. That’s in no small part due to automations that we’ve built.

And so, we initially made the strategic decision to focus on places where we get the most visibility. And so—where one, we get the most visibility, and two, we are ready for that level of scale. So, we found that, you know, for a large business, we’ve run inside some of their production environments and there are API calls that we don’t yet handle well or it’s just such a large number of calls, we’re not doing the inference as well and our algorithms don’t work as well. And so, we’ve made the decision to start small, build our way up, and start in places where we can just aggressively iterate because we can see everything that’s going on. And so, we’ve stayed away, for instance, from any on-prem deployments for that reason because then we can’t see everything that’s going on. And so, smaller companies that are okay with us watching pretty much everything they’re doing has been where we started. And now we’re moving up into the medium-sized businesses.

Corey: The challenge that I guess I’m still trying to wrap my head around is, I think that it takes someone with a particularly rosy set of glasses on to look at the current state of monitoring and observability and say that it’s not profoundly broken in a whole bunch of ways. Now, where it all falls apart, Tower of Babelesque, is that there doesn’t seem to be consensus on where exactly it’s broken. Where do you see, I guess, this coming apart at the seams?

Jean: I agree, it’s broken. And so, if I tap into my background, which is I was a programming languages person in my very recently, previous life, programming languages people like to say the problem and the solution is all lies in abstraction. And so, computing is all about building abstractions on top of what you have now so that you don’t have to deal with so many details and you got to think at a higher level; you’re free of the shackles of so many low-level details. What I see is that today, monitoring and observability is a sort of abstraction nightmare. People have just taken it as gospel that you need to live at the lowest level of abstraction possible the same way that people truly believe that assembly code was the way everybody was going to program forevermore back, you know, 50 years ago.

So today, what’s happening is that when people think monitoring, they think logs, not what’s wrong with my system, what do I need to pay attention to? They think, “I have to log everything, I have to consume all those logs, we’re just operating at the level of logs.” And that’s not wrong because there haven’t been any tools that have given people any help above the level of logs. Although that’s not entirely correct, you know? There’s also events and there’s also traces, but I wouldn’t say that’s actually lifting the level of [laugh] abstraction very much either.

And so, people today are thinking about monitoring and observability as this full control, like, I’m driving my, like, race car, completely manual transmission, I want to feel everything. And not everyone wants to or needs to do that to get to where they need to go. And so, my question is, how far are can we lift the level of abstraction for monitoring and observability? I don’t believe that other people are really asking this question because most of the other players in the space, they’re asking what else can we monitor? Where else can we monitor it? How much faster can we do it? Or how much more detail can we give the people who really want the power tools?

But the people entering the buyer's market with needs, they’re not people—you don’t have, like, you know, hordes of people who need more powerful tools. You have people who don’t know about the systems are dealing with and they want easier. They want to figure out if there’s anything wrong with our system so they can get off work and do other things with their lives.

Corey: That, I think, is probably the thing that gets overlooked the most. It’s people don’t tend to log into their monitoring systems very often. They don’t want to. When they do, it’s always out of hours, middle of the night, and they’re confronted with a whole bunch of upsell dialogs of, “Hey, it’s been a while. You want to go on a tour of the new interface?”

Meanwhile, anything with half a brain can see there’s a giant spike on the graph or telemetry stop coming in.

Jean: Yeah.

Corey: It’s way outside of normal business hours where this person is and maybe they’re not going to be in the best mood to engage with your brand.

Jean: Yeah. Right now, I think a lot of the problem is, you’re either working with monitoring because you’re desperate, you’re in the middle of an active incident, or you’re a monitoring fanatic. And there isn’t a lot in between. So, there’s a tweet that someone in my network tweeted me that I really liked which is, “Monitoring should be a chore, not a hobby.” And right now, it’s either a hobby or an urgent necessity [laugh].

And when it gets to the point—so you know, if we think about doing dishes this way, it would be as if, like, only, like, the dish fanatics did dishes, or, like, you will just have piles of dishes, like, all over the place and raccoons and no dishes left, and then you’re, like, “Ah, time to do a thing.” But there should be something in between where there’s a defined set of things that people can do on a regular basis to keep up with what they’re doing. It should be accessible to everyone on the team, not just a couple of people who are true fanatics. No offense to the people out there, I love you guys, you’re the ones who are really helping us build our tool the most, but you know, there’s got to be a world in which more people are able to do the things you do.

Corey: That’s part of the challenge is bringing a lot of the fire down from Mount Olympus to the rest of humanity, where at some level, Prometheus was a great name from that—

Jean: Yep [laugh].

Corey: Just from that perspective because you basically need to be at that level of insight. I think Kubernetes suffers from the same overall problem where it is not reasonably responsible to run a Kubernetes production cluster without some people who really know what’s going on. That’s rapidly changing, which is for the better, because most companies are not going to be able to afford a multimillion-dollar team of operators who know the ins and outs of these incredibly complex systems. It has to become more accessible and simpler. And we have an entire near century at this point of watching abstractions get more and more and more complex and then collapsing down in this particular field. And I think that we’re overdue for that correction in a lot of the modern infrastructure, tooling, and approaches that we take.

Jean: I agree. It hasn’t happened yet in monitoring and observability. It’s happened in coding, it’s happened in infrastructure, it’s happened in APIs, but all of that has made it so that it’s easier to get into monitoring debt. And it just hasn’t happened yet for anything that’s more reactive and more about understanding what the system is that you have.

Corey: You mentioned specifically that your background was in programming languages. That’s understating it slightly. You were a tenure-track professor of computer science at Carnegie Mellon before entering industry. How tied to what your area of academic speciality was, is what you’re now at Akita?

Jean: That’s a great question and there are two answers to that. The first is very not tied. If it were tied, I would have stayed in my very cushy, highly [laugh] competitive job that I worked for years to get, to do stuff there. And so like, what we’re doing now is comes out of thousands of conversations with developers and desire to build on the ground tools that I’m—there’s some technically interesting parts to it, for sure. I think that our technical innovation is our moat, but is it at the level of publishable papers? Publishable papers are a very narrow thing; I wouldn’t be able to say yes to that question.

On the other hand, everything that I was trained to do was about identifying a problem and coming up with an out-of-the-box solution for it. And especially in programming languages research, it’s really about abstractions. It’s really about, you know, taking a set of patterns that you see of problems people have, coming up with the right abstractions to solve that problem, evaluating your solution, and then, you know, prototyping that out and building on top of it. And so, in that case, you know, we identified, hey, people have a huge gap when it comes to monitoring and observability. I framed it as an abstraction problem, how can we lift it up?

We saw APIs as this is a great level to build a new level of solution. And our solution, it’s innovative, but it also solves the problem. And to me, that’s the most important thing. Our solution didn’t need to be innovative. If you’re operating in an academic setting, it’s really about… producing a new idea. It doesn’t actually [laugh]—I like to believe that all endeavors really have one main goal, and in academia, the main goal is producing something new. And to me, building a product is about solving a problem and our main endeavor was really to solve a real problem here.

Corey: I think that it is, in many cases, useful when we start seeing a lot of, I guess, overflow back and forth between academia and industry, in both directions. I think that it is doing academia a disservice when you start looking at it purely as pure theory, and oh yeah, they don’t deal with any of the vocational stuff. Conversely, I think the idea that industry doesn’t have anything to learn from academia is dramatically misunderstanding the way the world works. The idea of watching some of that ebb and flow and crossover between them is neat to see.

Jean: Yeah, I agree. I think there’s a lot of academics I super respect and admire who have done great things that are useful in industry. And it’s really about, I think, what you want your main goal to be at the time. Is it, do you want to be optimizing for new ideas or contributing, like, a full solution to a problem at the time? But it’s there’s a lot of overlap in the skills you need.

Corey: One last topic I’d like to dive into before we call it an episode is that there’s an awful lot of hype around a variety of different things. And right now in this moment, AI seems to be one of those areas that is getting an awful lot of attention. It’s clear too there’s something of value there—unlike blockchain, which has struggled to identify anything that was not fraud as a value proposition for the last decade-and-a-half—but it’s clear that AI is offering value already. You have recently, as of this recording, released an AI chatbot, which, okay, great. But what piques my interest is one, it’s a dog, which… germane to my interest, by all means, and two, it is marketed as, and I quote, “Exceedingly polite.”

Jean: [laugh].

Corey: Manners are important. Tell me about this pupper.

Jean: Yeah, this dog came really out of four or five days of one of our engineers experimenting with ChatGPT. So, for a little bit of background, I’ll just say that I have been excited about the this latest wave of AI since the beginning. So, I think at the very beginning, a lot of dev tools people were skeptical of GitHub Copilot; there was a lot of controversy around GitHub Copilot. I was very early. And I think all the Copilot people retweeted me because I was just their earlies—like, one of their earliest fans. I was like, “This is the coolest thing I’ve seen.”

I’ve actually spent the decade before making fun of AI-based [laugh] programming. But there were two things about GitHub Copilot that made my jaw drop. And that’s related to your question. So, for a little bit of background, I did my PhD in a group focused on program synthesis. So, it was really about, how can we automatically generate programs from a variety of means? From constraints—

Corey: Like copying and pasting off a Stack Overflow, or—

Jean: Well, the—I mean, that actually one of the projects that my group was literally applying machine-learning to terabytes of other example programs to generate new programs. So, it was very similar to GitHub Copilot before GitHub Copilot. It was synthesizing API calls from analyzing terabytes of other API calls. And the thing that I had always been uncomfortable with these machine-learning approaches in my group was, they were in the compiler loop. So, it was, you know, you wrote some code, the compiler did some AI, and then it spit back out some code that, you know, like you just ran.

And so, that never sat well with me. I always said, “Well, I don’t really see how this is going to be practical,” because people can’t just run random code that you basically got off the internet. And so, what really excited me about GitHub Copilot was the fact that it was in the editor loop. I was like, “Oh, my God.”

Corey: It had the context. It was right there. You didn’t have to go tabbing to something else.

Jean: Exactly.

Corey: Oh, yeah. I’m in the same boat. I think it is basically—I’ve seen the future unfolding before my eyes.

Jean: Yeah. Was the autocomplete thing. And to me, that was the missing piece. Because in your editor, you always read your code before you go off and—you know, like, you read your code, whoever code reviews your code reads your code. There’s always at least, you know, two pairs of eyes, at least theoretically, reading your code.

So, that was one thing that was jaw-dropping to me. That was the revelation of Copilot. And then the other thing was that it was marketed not as, “We write your code for you,” but the whole Copilot marketing was that, you know, it kind of helps you with boilerplate. And to me, I had been obsessed with this idea of how can you help developers write less boilerplate for years. And so, this AI-supported boilerplate copiloting was very exciting to me.

And I saw that is very much the beginning of a new era, where, yes, there’s tons of data on how we should be programming. I mean, all of Akita is based on the fact that we should be mining all the data we have about how your system and your code is operating to help you do stuff better. And so, to me, you know, Copilot is very much in that same philosophy. But our AI chatbot is, you know, just a next step along this progression. Because for us, you know, we collect all this data about your API behavior; we have been using non-AI methods to analyze this data and show it to you.

And what ChatGPT allowed us to do in less than a week was analyze this data using very powerful large-language models and I have this conversational interface that both gives you the opportunity to check over and follow up on the question so that what you’re spitting out—so what we’re spitting out as Aki the dog doesn’t have to be a hundred percent correct. But to me, the fact that Aki is exceedingly polite and kind of goofy—he, you know, randomly woofs and says a lot of things about how he’s a dog—it’s the right level of seriousness so that it’s not messaging, hey, this is the end all, be all, the way, you know, the compiler loop never sat well with me because I just felt deeply uncomfortable that an AI was having that level of authority in a system, but a friendly dog that shows up and tells you some things that you can ask some additional questions to, no one’s going to take him that seriously. But if he says something useful, you’re going to listen. And so, I was really excited about the way this was set up. Because I mean, I believe that AI should be a collaborator and it should be a collaborator that you never take with full authority. And so, the chat and the politeness covered those two parts for me both.

Corey: Yeah, on some level, I can’t shake the feeling that it’s still very early days there for Chat-Gipity—yes, that’s how I pronounce it—and it’s brethren as far as redefining, on some level, what’s possible. I think that it’s in many cases being overhyped, but it’s solving an awful lot of the… the boilerplate, the stuff that is challenging. A question I have, though, is that, as a former professor, a concern that I have is when students are using this, it’s less to do with the fact that they’re not—they’re taking shortcuts that weren’t available to me and wanting to make them suffer, but rather, it’s, on some level, if you use it to write your English papers, for example. Okay, great, it gets the boring essay you don’t want to write out of the way, but the reason you write those things is it teaches you to form a story, to tell a narrative, to structure an argument, and I think that letting the computer do those things, on some level, has the potential to weaken us across the board. Where do you stand on it, given that you see both sides of that particular snake?

Jean: So, here’s a devil’s advocate sort of response to it, is that maybe the writing [laugh] was never the important part. And it’s, as you say, telling the story was the important part. And so, what better way to distill that out than the prompt engineering piece of it? Because if you knew that you could always get someone to flesh out your story for you, then it really comes down to, you know, I want to tell a story with these five main points. And in some way, you could see this as a playing field leveler.

You know, I think that as a—English is actually not my first language. I spent a lot of time editing my parents writing for their work when I was a kid. And something I always felt really strongly about was not discriminating against people because they can’t form sentences or they don’t have the right idioms. And I actually spent a lot of time proofreading my friends’ emails when I was in grad school for the non-native English speakers. And so, one way you could see this as, look, people who are not insiders now are on the same playing field. They just have to be clear thinkers.

Corey: That is a fascinating take. I think I’m going to have to—I’m going to have to ruminate on that one. I really want to thank you for taking the time to speak with me today about what you’re up to. If people want to learn more, where’s the best place for them to find you?

Jean: Well, I’m always on Twitter, still [laugh]. I’m @jeanqasaur—J-E-A-N-Q-A-S-A-U-R. And there’s a chat dialog on akitasoftware.com. I [laugh] personally oversee a lot of that chat, so if you ever want to find me, that is a place, you know, where all messages will get back to me somehow.

Corey: And we will, of course, put a link to that into the [show notes 00:35:01]. Thank you so much for your time. I appreciate it.

Jean: Thank you, Corey.

Corey: Jean Yang, CEO at Akita Software. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that you will then, of course, proceed to copy to the other 17 podcast tools that you use, just like you do your observability monitoring suite.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Fixing What’s Broken in Monitoring and Observability with Jean Yang

Episode Summary

Episode Show Notes & Transcript

You might also like

Generating AI Laughs with Daniel Feldman

Piledriving the GenAI Grift with Nikhil Suresh

Summer Replay – The Future of Kubernetes with Bryan Liles

Get the Newsletter

Sponsor an Episode