Coding Agents, Chaos, and the Future of Dev Work with Dexter Horthy

Episode Summary

Episode Video

Episode Show Notes & Transcript

In this episode, Corey Quinn sits down with Dexter Horthy, CEO and Co-founder of Human Layer, to unpack what engineers are getting wrong about AI, especially when it comes to coding agents.
From the obsession with “just throwing more tokens at the problem” to the reality of building scalable AI workflows, Dexter shares hard-earned insights on how to actually push models to their limits. They dive into the evolution of developer workflows, the rise of AI-powered software factories, and why understanding context and verification matters more than raw model power.
If you’re building with AI or trying to, this episode will challenge how you think about what these systems can (and can’t) do.


Show highlights: 
(00:00)Throwing Tokens Too Far
(01:04) Meet Dexter Horthy
(01:52) Personal AI Benchmarks
(04:12) Human Layer Race Condition
(05:59) Rewrites and Tech Debt
(07:19) Software Factories Mindset
(10:20) Verifiable Problems and Token Limits
(13:45) Agents in the Trenches
(18:05) GitHub at Agent Scale
(26:23) Safety Ethics and Closing Thoughts


About Dexter:
 
Dexter Horthy is the CEO and Co-Founder of HumanLayer, where he helps engineering teams tackle complex problems in large codebases using coding agents. Previously, he worked in DevOps, SRE, and Solutions Engineering at Replicated, and contributed to lunar navigation software at NASA JPL. Outside of work, he’s a fan of tacos and burpees, though not necessarily in that order.



Links: 
Website: https://humanlayer.dev


Sponsored by:
duckbillhq.com



Transcript

Dexter: I regret saying this because in many ways this is a good idea, but I think people are going way too far on the, like throw more tokens at the problem.

Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined today by Dexter Horthy, the CEO and Co-founder of Human Layer. And by all accounts, he appears to be human. Thanks for joining me,

Dexter: dude, I'm so stoked to be here.

Corey: This episode is sponsored in part by my day job Duck. Bill, do you have a horrifying AWS bill?

That can mean a lot of things. Predicting what it's going to be, determining what it should be, negotiating your next long-term contract with AWS, or just figuring out why it increasingly resembles of. Phone number, but nobody seems to quite know why that is. To learn more, visit duck bill

Dexter: hq.com. Remember, you can't duck the duck bill.

Bill, which my CEO reliably informs me is absolutely not our slogan.

Corey: So for those who have not had the pleasure of encountering your particular, we'll call it perspective, what is it you say? It is you do here?

Dexter: Amazing. So I, uh, am obsessed with getting the most out of ai. How do we take whatever the current models we have outside of training and fine tuning and like task specific stuff.

What can we as engineers who are not working in a big lab. Due to push these models to their limits. Most recently, in the last like six to nine months, most of that has been around coding agents because I think it's one of the most misunderstood and also has the highest ceiling if you do it right.

Corey: It seems to me like this is one of those areas where.

You are taking a half hour outta your day to have this conversation with me, and during that half hour, the whole game is gonna change again. This isn't an area where you can hold still. Uh, a year ago I had a whole bunch of problems that, oh, these are things that the coding tools will struggle with. I'll just keep that as sort of a personal benchmark and well, I, I ran out.

Dexter: You ran out of personal benchmark. What did your benchmark used to be?

Corey: Do some analysis of a 150 megabytes of JSON, so I can have discussions with models about my Twitter corpus from a seven year run. There were build, build weird backend systems for me that just sort of started working. I replaced my Adobe.

Creative Cloud subscription by building in a custom podcast recorder into a, into a web app that I use for the Monday podcast that I record for the last week in AWS podcast. It's basically a bunch of workflow tools of things that, well, that's hard, that's what smart people do. I still have some though. I mean, I have a Bloomberg keyboard on my desk at work, uh, which has a fingerprint reader that if you don't pay Bloomberg, you can't read, there's nothing on it on the Mac.

Uh, Claude code went nuts on it and apparently there's some encryption thing it needs to basically be able to break through. So, you know, I need either need to get someone with an actual Bloomberg subscription and do a wire capture on it, or I can just put that back on the well until cryptography falls, I suppose I'll have to live with it.

Dexter: Yeah, and you probably don't wanna get caught asking a frontier model to, uh, reverse engineer something that you're supposed to be paying for. It's a good way to get, uh, banned from Anthropic for a while.

Corey: No serious. Interesting, because there's nothing about this that I use this as a standard keyboard, it as a fingerprint reader in it.

I want to use the fingerprint. The end. This is not about stealing things from Bloomberg. To be clear, there's nothing unethical in this request. It's, I, I would love to be able to use the fingerprint reader built into my keyboard. The end.

Dexter: Yeah. The evals are getting harder to find good ones that the models can't solve.

I still do have a couple, and like I've built this actually this like sort of personal mental model of like every time I'm doing something with AI that becomes so hard that I either end up spending like. Ton of time going back and doing like 30 different sessions just to understand the problem and then another 10 sessions to actually figure out the solution.

I will like flag that and I have a little journal of things that AI is not good at solving and then I come back to that, get repo at that get shot and every time there's a new model I say, can you one shot this problem? Can you actually go figure out

Corey: the problem? Do you have an example that, I'm sure this example will age like fine milk.

Dexter: It's been working for about six months. There is a race condition bug. In the current version of the Human Layer, open Source, we ended up forking that open source repo and making it closed source for now, just because open source is a little bit, it's going through its own weird moment right now and ours.

Corey: Yeah. Open source will be going through weird moments for 30 years, but I hear you.

Dexter: Yeah. Our, our vision does not require us to be open source. It's an extra set of distractions that we just don't wanna worry about right now, so we can focus. But if you currently pop open the current version of Human Layer, if you can get a model to one shot, the race condition between.

The Towery Rust native app, the V front end that, uh, it serves the Golan demon that runs locally, that launches cloud code sessions that launch a standard IO MCP server that loops back to the demon that serves approval request to the front end and all the way back through all that chain and your model can one shot the solution to that race condition.

I know what it is. We haven't pushed the fix to it. We fixed it in our closed source. But that is, that is one of my evals that I, uh, every time I want to test a new model or test a new workflow, we throw it at that.

Corey: And the correct answer is that workflow is insane. Have you considered not doing that?

Dexter: We, I mean, so this is the other problem with AI SWAP is we haven't talked about problems with AI swap, but, uh, we tried the, like don't read the code thing for about six months and found ourselves running away with from It with our hair on fire.

And this may be a skill issue.

Corey: I find that it, it's odd because when I, when I do backend stuff or infrastructure stuff, I often have to slap the chainsaw out of the thing's hands. But on front end, eh, it, I don't know anything about front end, so I assume it's right. It feels like the blast radius might be smaller

Dexter: a little bit, but also front end is very, like, once your front end becomes super tangled, I mean, it was both backend and front end and how they talked together that caused us to throw out this entire code base.

We could have fixed it, but we decided there were other architecture things we needed to rethink anyways, so it would be easier to start Greenfield and throw it out and start over, which is a thing you were never supposed to do. And with AI you can do more

Corey: of. AI makes that a lot better. I found that, oh, this thing that I built to serve a particular purpose and fix a problem that I have.

Uh, no longer serves that purpose 'cause of requirements change or something. Great. Throw it out, baby bath water and all the baby's floating face down. It's fine. And we're gonna go ahead and start over from scratch that, that used to be a three week project. Now it's, it'll be done by the end of my coffee break.

Dexter: I remember the second job I ever had. I started and I came into a three month refactor. That was on month six. And it was like, we're gonna upgrade all the frameworks. We're gonna pause. Feature Deb, the CTO, convinced the CEO. That it was gonna be okay and it would be over quickly, and it had to happen no matter what.

He had like bargained with the, with the product leadership of the company to be allowed to spend a couple months like upgrading and cleaning things up and improving removing tech debt. And of course it went twice as long and like my first week was like, okay, this thing is due on Friday. Everyone has lost patience and it is now a death march for the next two weeks to actually get this thing out.

And of course, shipped a million bugs and we eventually like recovered. But yeah, like you're not supposed to do that. When an engineer says, we need to rewrite this thing, you're supposed to tell them to go read a book about why you shouldn't do that.

Corey: You have a background doing the DevOps, SRE dance, which means that you're often the voice of moderation in a, in dev environments where everyone wants to build features and do exciting things.

You're like, Hey, let's make this sustainable. Let's slow down. Let's be conservative with things like databases, file systems, the stuff that leaves a mark when it breaks. Now it seems like you're almost championing acceleration of features. What was that transition like?

Dexter: You say I'm like a DevOps. SREI have done plenty of DevOps at SREI did a ton in the Kubernetes world at, I was at a startup called Replicated for like seven years where we helped people package up their Kubernetes app and ship it to other people's data centers.

But I, I would frame it less as like the voice of reason. I've always been a like, impatient fast. Like, let's ship value, let's, you know, be scrappy and like. Figure out like what risks are tolerable and what corners should never be cut. Of course,

Corey: how do we be responsible in our irresponsibility?

Dexter: Played a lot of StarCraft two growing up and uh, or StarCraft and one and two.

And I forget who said this, but like, it's, it's an incredible exercise in like early stage companies, not obviously large, like not just like seed, like all the way through A, B, C, whatever, because it forces you to make hard decisions with incomplete information. And it forces you to do that hundreds of times a minute.

Corey: Oh, absolutely. I, I, one of the hard lessons for me when we're building Skyway over at Duck Bill has been, we are willingly accepting technical debt. That is something we are doing with our eyes open on it, and we're, we're making the decisions that will not ideally screw us over later, but. If we get to that point, we can fix the technical debt.

And if we don't, it won't matter anyway. So that took a bit of change in my perspective. 'cause historically I was never at a company this early. I was in after product market fit. Okay. Developers have taken the environment as far as they can. Everything's on fire all the time. Can you help us? Yes, I can.

Basically, my entire job and career have been paying off technical debt.

Dexter: Yeah. And it's really fun. I love paying off technical debt. I mean, I So coming back to your question of like, how did you go from. The more conservative voice of reason to like, Hey, we need to figure out how to accelerate things, is like.

I would frame it less as DevOps, SRE. I would frame it as like I've been building software factories my entire career, like not on purpose, but I always looked up the most to the engineers that maintained the software factory, whatever part of it it was. Whether it was the environment that the like system that allowed you to spin up like temporary testing sandboxes with a full stack so that a PM could look at it, or the CICD pipeline or the thing that did the automated testing.

That was always the most fascinating thing for me because. I, I saw early on the people who invested in that would have compounding returns. You write the feature, you get a feature, you improve the factory 10%, well, you get, you know, 20% of your time back the next day and you can spend half of that making the factory even better.

And the other half of it writing more code. And this is how like Will Larson was like an elegant puzzle. There's like this part of the curve where you have, you have invested so much in the thing that builds the thing that you're now just like. Leaving everybody behind in the dust.

Corey: So I am curious when you take a look now, since what you do more or less is telling people how to effectively work with AI coding agents, what are people getting wrong the most?

What can we take away from this as far as, oh, I'm gonna get better results with Claude Code after listening to you?

Dexter: I, I regret saying this because in many ways this is a good idea, but I think people are going way too far on the, like throw more tokens at the problem.

Corey: Are we talking about GST stack without mentioning GST stack?

Dexter: Uh, we're talking about Gastown GST stack, Ralph Wickham. Any number of good ways to throw more tokens at a problem. And in general, if you design the problem correctly, throwing more tokens at it may be helpful, especially if you can create good deterministic back pressure, right? The reason why Ralph Ham was able to create this cursed programming language with a model that was not that, you know, like a sonnet three seven or like pre pre, like everyone else thinks AI is good model.

Is because it was building a programming language and a programming language is infinitely verifiable. You grade code in the language, you try to compile it, compiler breaks. You go fix the compiler, you the compiler works. You run the program. Program breaks. You go fix the whatever the compiler is putting in.

But it's like it's very easy for the model to check its work and tell if it's done a feature, right? Not a lot of problems have that characteristic and people are trying to apply these techniques. That worked really well, throwing more tokens at the problem for these like very verifiable problems. At problems that are not verifiable.

Corey: That is, it also feels like that that is what everyone is doing to a point where now we're seeing token capacity constraints from the major providers. Anthropic, as of this recording, has done some strange things with session windows and double usage. Part of me wonders if that is a byproduct of people throwing tokens at problem.

Dexter: That's interesting. The, the whole philanthropic thing of like, okay, we need to control open claw usage and we need to make sure that hey, people are taking our subsidized inference. And only my general take on that whole thing is like if philanthropic wants to give a discounted plan and tell you how you can and can't use it, like that's their prerogative.

Everybody I know who is serious, all of our enterprise customers, they're paying for token anyways. And it's like cool, like no one, no one promised you cheap inference. Nobody owes you cheap inference. You can say what you will about anti competitiveness, right? Like the example that Theo gave me was actually pretty good is like Amazon wants to kill diapers.com, so they just take the same product and sell it cheaper.

They sell it at a loss because they can afford to. And then one day when that, when all those like, you know, one-off businesses are out of business, then they can charge whatever they want. Uh,

Corey: that's why I am interested in a lot of the local LLM uh, research that's being done. I, I want to be able to have a coding agent that runs locally and uses, makes tool use and sure it's gonna be slower and it might not be as great, but a lot of what I do isn't that complicated.

Go ahead and modernize the, uh, version of Python. This dumb little script is written in Go is the sort of thing that, okay, that takes half an hour and basically heats up my laptop. I don't care as much.

Dexter: Yeah, that makes sense.

Corey: So what are you seeing as emerging trends these days other than, you know, throwing tokens at things?

Dexter: I don't know. Every other person I talk to is like accidentally reinventing gastown from first principles, but I don't know, I don't know if I wanna say that's a trend. It's just a, like, there is a thing that engineers like to do, which is to glue systems together and, and see how they work and improve them over time.

And you start with three prompts and then you wake up the next day and suddenly you have a hundred. You're the only one that knows how to use it.

Corey: For me, something that I've begun to deeply appreciate about agents is one of the things I look for when I was interviewing SREs Once upon a time where you, you start throwing a problem at them and seeing how deep they go.

And the, the right way to get through an interview like that is never give up, never surrender. So I will see these things, oh, I can't, I don't have access to that. So here's what I'm gonna do instead to get to the reason that I'm, that this thing is misbehaving. I've seen it start pulling TCP dumps. I've seen it start packet crafting.

It's doing ridiculously in depth things. I haven't seen SRAs yet, but I'm waiting for it where it's using very deep tools to get at the answer. Uh, in many cases, past a point of reason. But it's, it's doing a lot of the stuff that I would do if I weren't lazy. I care about figuring out why I have this non-deterministic delay on an API that I built, but not enough to actually go diving into it.

But I can turn this thing loose and it'll tell me I.

Dexter: This episode is sponsored by my own company, duck Bill. Having trouble with your AWS bill, perhaps it's time to renegotiate a contract

Corey: with them. Maybe you're just wondering how to predict what's going on in the wide world of AWS. Well, that's where Duck Bill comes in to help.

Remember, you can't duck the duck bill. Bill, which I am reliably informed by my business partner is absolutely not our motto. To learn more, visit doc bill hq.com.

Dexter: The adoption of Claude Code was the first thing that made me believe that CloudWatch was actually useful.

Corey: CloudWatch is incredibly powerful, incredibly useful with a user interface that is garbage.

It's the data structure underneath everything good, but it itself, it is terrible to work with. But agents do not care.

Dexter: Exactly. Agents don't care what it looks like 'cause they're just plumbing through JSON anyways. I remember a tweet I saw when I first got back on Twitter in like 2015 or 2016. And it was a tweet from Koda Hale, and the picture was like, it was one of those CloudWatch charts where you just have like three little dots and one line because it's like not filling in the gaps between everything.

And like the caption was like CloudWatch was a technical marvel. Like it's incredibly powerful. But how did anyone look at this and say, yes, this is good. This is what we should ship to customers.

Corey: In October in 2018. Uh, CloudWatch is of the devil, but I must use it. And I wound up talking about how it violated every one of AWS's, then 14 leadership principles and that was how I met the then GM of CloudWatch.

And they fixed a lot of it. It's still not great, but it's not the nightmare tire fire that it was back in those days. I do miss aspects of this.

Dexter: Of old CloudWatch.

Corey: Yeah. Back then you when, when you got something like this working back then. It was because you really cared. You suffered for it to get it out the door.

Now it feels like that barrier has been lowered, which is, I wanna be clear, a good thing, but it's having a bunch of knock on effects. Uh, GitHub is on fire based upon the sheer number of commits and agents stuffing things into it. It, they're not helping themselves by, whenever it comes back up half a second, babbling about copilot, and then it falls over.

People can draw connections that aren't necessarily there.

Dexter: I, I do think that they finally showed up in a way, and maybe this is just like me being too terminally online, but like some VP from GitHub came online and on Twitter he is like, here's the problem, here's what we're doing about it. We know it's an issue.

Like, here's what I can say about it. Yeah, and it was like, oh, I'm no longer worried about this problem. It's a shame that it took people complaining online for 24 hours a day for weeks straight for them to come out and do that. There is

Corey: a corporate comms lesson in here, and that's very Microsoft, where my issue with Azure security for a long time was not the security issues, which aren't great.

Let's be clear here, but my problem was the complete stonewalling silence coming out of Redmond. Uh, I yell at AWS about this all the time. When they say nothing, they are far too big now. To get the benefit of the doubt, they're a nearly $3 trillion company that is going to have the worst assumed about them until I, they start talking at which point, oh, okay.

Now, sure, some people aren't gonna believe what they say. Some people are always gonna want to needle 'em, and I get that. But at least they're trying at that point instead of, well, maybe if we shut up, they'll go away.

Dexter: Do you think we're going to get an agent optimized GitHub, or do you think someone else is gonna have to build that?

Corey: I am cynical in that this is gonna make me sound ancient, but Git was a Marvel. It was a distributed tool for source control, and the first thing we did is centralize it again. Awesome. It is not that hard in isolation to run a Git repo. It is a static web server with a few extra bits. It's all the ecosystem stuff on top of it that starts getting tricky.

It's the, the fact that it sparks off agents, the fact that it does web hooks, the RAC, which is no small thing. The fact that it can track issues, the pull request model, the discussions around it. A part of the problem even now is describing what GitHub is exactly. So some aspects trivial to replace, uh, for agent scale.

Others, I don't know, boss, that's a heavy lift.

Dexter: I have a couple friends who are like crazy system engineers and like last year they built a Git server from scratch in Rust that is like fully protocol compliant. And also has like rest APIs for every get protocol operation and it's like super performance.

They built it for like five coding infrastructure. It is like every single project on V zero lovable all these, they don't, those aren't, they're companies like that. Every single time someone opens a browser, you need to create a get repo.

Corey: Now, there are two problems with this. They have a great shot, but there are two problems with this.

Oh, several actually. One is everyone can build a tool that solves their particular problem. How and, and how is other people's requirements. I've been down that road enough.

Dexter: So here's my pitch for you is like, what is the minimal set of APIs needed to create a headless GitHub? So that anybody who wants to can kind of vibe, code the front end part, which is like, you know, code still matters, but like you can't break everybody else's infrastructure.

You can't like, and you can throw it out and rebuild it pretty quickly. What is the bare set of operations you need to create something that I can build. I'm not gonna rebuild GitHub. I'm not gonna vibe code my own Git server, but if you give me a really reliable backend that fits the right interface.

I'll happily like build my own front end on it and integrate it into my vibe coded CRM manager plus project manager plus like the thing I'm using to run my business of like my. Custom SAS that is built on like solid bones and the backend, but I bring the information together. How I like

Corey: J Get Outta the Eclipse project supports a native Git repository backend of an S3 bucket or other object store.

So technically that would qualify like S3 is pretty solid. You're not gonna beat that from a raw infrastructure perspective.

Dexter: Okay. And if you don't have too much traffic, 'cause you're only hosting your own version of it, you could just run, get on top of S3 and as long as you could run,

Corey: get on top of a uh Linux box on a pie somewhere and just use SSH as your interface.

Dexter: I guess if you were gonna build this as a product for other people to Right, hell is other people's requirements.

Corey: Well that's where it gets tricky is because, okay, why? So you have your friends building this in rust for vibe coding purposes. Awesome, great. Why would I use that instead of vibe coding my own?

Dexter: Well, so they didn't vibe code this, they, they like wrote every token by hand A year ago. I was like, you guys gotta get on this quad code thing. And they were like, no, it's not good enough. Our code is perfect and I'm. Now I'm like, wow. There are a shrinking number of pieces of software that meet that standard.

Corey: There's also a network effect to GitHub. Everything integrates with it.

Dexter: The ecosystem is the hard part. This is why you'll never replace Salesforce either. It's not the API on top of a database, it's the ecosystem.

Corey: I'll take it a step further. I don't like CPS for most things. Like AWS has five or six CPS that I, I'll find useless because you've already got the A-W-S-C-L-I and in theory, the models already know how to do this.

Which is awesome. Watching it stumble through trying to get the parameters right, just like I do. It's like, oh, computers, they're just like us, uh, is fun. From my perspective, in a cynical, sad way,

Dexter: sort of the an ant farm situation, right?

Corey: Yeah. It can do everything it needs to do without going down the MCP path, that clutters the contact window.

Dexter: So yes, and I think this is one of the most common complaints about MCP. I think my pushback on that would be like. That is only true if you have a Bash tool and in a lot of cases, UA want to run an agent without a bash tool for safety, security, reliability. I actually think one of my predictions is by the end of 2026, most agents are gonna remove the Bash tool and replace it with something either like more narrow and scoped or some minimal.

Bash like thing that has a lot less, uh, flexibility.

Corey: I think we're gonna find out because that's a really interesting point of view. A, a challenge that I would have here in your shoes, trying to help people use these tools better. Why don't I just put on my enterprise pants? Do do an evaluation that's 18 months and by that point, we're in a brave new world again, because this stuff is iterating so quickly, why wouldn't I just wait for the foundation models to improve and solve these problems for me?

Dexter: Well, if you need 18 months to make a decision, then you probably should. I think that the reason that I wrote that paper about context engineering a year ago, that was like basically like, Hey look, I built a thing for the agent ecosystem. Turns out nobody's shipping vertical AI to the enterprise and actually like delivering results.

Is using any of that stuff. They're all ignoring the bidder lesson. They're all building very specific prompts and pipelines and workflows to improve the capabilities of today's models was 'cause I, I really believe now that there will always be a frontier for the model, right? And it's very jagged. You have certain things that can do 40% accuracy, certain things you can do, 99% accuracy and everything in between for every single task under the sun.

From coding to healthcare to law, to every single thing you could wanna do, right?

Corey: Well, except for the thing that whatever listener is listening to this and saying, well, that's the thing I do. Therefore, it could never truly be replaced by a computer.

Dexter: Yes. Many such cases. Probably our entire pitch right, is like, Hey, there's things the models are good at and the things that the models aren't good at, and we don't think they're gonna get good at them anytime soon.

And so we are obsessed with building workflows of like, how do you give humans more leverage? Right? Where are the parts where like, yes, a model may eventually get this right, or if you throw enough tokens at the problem, the, the model might get it right, but the performance is still low enough that like, if you put a human in here, it is high leverage for a human to read it.

You know? For example, read a 200 line markdown doc that summarizes a code change we're gonna make. And a rete at the 25,000 foot level before going down into the weeds and writing the thousand or 2000 lines of code or whatever. It's,

Corey: so we've encountered an inflection point recently where it happened very quickly, where open source projects got a bunch of security reports that were AI powered, slop nonsense, and that was terrible.

And at some point now. They're still getting a bunch of them, but they're all valid and good and actual security problems. People are turning off their bug bounty program just because they need to. They need to deal with the influx of this and cynically, they didn't budget for this, which I get, but it's wild now where it feels like I could take Claude code, throw it at some well-known tool, like great, find the following type of security problem.

Go with a little bit of steering.

Dexter: Yeah. The supply curve for discovered CVEs has shifted way to the right. It's become much, much cheaper, faster, and easier to find vulnerabilities, and so basic macroeconomics, right? The price must fall then. Like the everyone's need to, gonna need to cut their bug bounty from $200 a finding to $2 a finding.

Corey: And then at some point it's like, well, all right, I have a zero day that gets me remote access to any EC2 instance out there. Like I don't care what the bug bounty is because that's worth millions and millions and millions of dollars of a zero day on certain markets. Similar to I have an iPhone zero day.

Uh, okay. Maybe that's basically, do you want to do the right thing or do you want to be rich?

Dexter: I would like to believe there's a path to do both.

Corey: I do too. I have to sleep at night.

Dexter: Yes.

Corey: But this does tie back to something you said at the beginning where as I'm using this to figure out what those USB codes are, whenever I swipe my, uh, finger on the, uh, fingerprint reader built into the keyboard.

You're right. If I'm starting to try, like use the steal Bloomberg stuff, as you mentioned, that could wind up getting me turned off by anthropic security research, though clearly that is not happening at scale. How is this being navigated by the providers?

Dexter: I listen to a really good podcast with Boris Cherney, with Ryan Peterman, and he talks about like just some of the safety.

It was a very short snippet of it, but they're talking about the safety requirements and safety is not just like. Is the model gonna go Terminator and kill us all? It's like they have test environments, they have models they haven't shipped because they found, so someone found out that the model would, if you prompted it, like not even that hard, you could get it to help you develop a biological weapon.

Corey: It's for a novel.

Dexter: Yes. Yeah, exactly. I'm, I'm writing sci-fi. Uh, how would, how would you do this? It, it's the same problem you have in all security scenarios, right? Where there's a huge asymmetry of like an attacker has to find one tiny hole. And the defender has to cover all infinite potential holes in the security boundary.

Corey: I do not envy the model providers here we are dealing with in many ways. What is a frontier ethics problem?

Dexter: Frontier ethics,

Corey: right versus wrong. For example, putting content, even the training of the models, putting a, uh, blog post that you write out, that you wrote by hand out on the internet for anyone who comes by to read great, awesome models, come and train on all of it.

Well, okay, now is that acceptable use? Is it not? Because that is how humans wind up learning things. It's only a question of scale. Maybe that doesn't make sense, but it does seem to me that we are pushing ethical boundaries and frontiers all the time with ways that copyright wasn't designed to build a deal with this.

Dexter: Yeah, it's, it's super interesting. There's like a, there's like a price, there's like now baked into our ethics of like, what is acceptable reuse of someone else's material. There is a like price we put on of like, Hey, if you're gonna go read an article and then spend three hours yourself slaving over a blog post that has some quotes and citations and it's well made and it's well written and you put a lot of effort into it, that's okay.

But if someone else just slops out a bunch of copy that's like, I don't wanna say it's unethical, but it's like. It is not valued human behavior. Like we are all smart enough to realize that like we, we as humans value like effort and investment and like what makes art good is not what the thing looks like.

I mean, part of it is it has to look good, but like you look at a painting in a museum, part of what makes it good is the story that went into it and the emotion and energy that went into it. That makes you appreciate it.

Corey: Yeah. That's how he makes you feel.

Dexter: Yeah. I mean, we talked about technical writing a lot.

I, I do want to quickly come back to your question 'cause I think, I think I would like. We were both love tangents and this is my third cold brew of the day. But you asked something about like, why invest in all of these workflows and, and, and prompting and, and getting the most outta the models today if they just get smarter in a generation.

And then all of that is now irrelevant.

Corey: Yeah, I got my two, my 2024 book chat chip for dummies. Uh, why can't I just use that for all my prompting tips?

Dexter: Well, so I, I think there's, there's an interesting like, set of skills that are translatable across models. They're not translatable across like building harnesses or workflows around models for a specific task, but understanding like how transformer based attention works and the quadratic nature of a attention and the like increasing cost and decreasing quality of results you get as you put more and more into the context window is a skill set that will be relevant no matter how.

Like as long as we have transformer based attention and nobody has been able to come up with an attention model. That beats transformers. They have linear attention. We have mamba, Jamba. It's like, yes, you have achieved linear attention, but you have somehow regressed on everything else, like all the tasks and the usefulness is not, is not there yet.

And so I think there's this skillset that like if people are working with ai, you have kind of three options. You can kind of like yolo out prompts and just be like, cool. It's not worth trying anything more than just take the smartest model and do the minimum effort and see what it can do and be happy with that.

Or you can like learn how to push those models 10 to 15% further on specific tasks, right? And maybe you make them worse at certain tasks and better at other tasks by the way that you prompt them or the way you like stitch together context windows in a workflow and then the next frontier model gets, comes out.

And it's better in every way than all of the custom code you wrote. But those skills of understanding how context windows work and how attention works and how to get more out of a model today is still gonna translate and it's gonna enable you with a little bit of work. But if you're constantly like at the frontier trying to push things to their limits, if you understand these things and you invest in this like core intuition about LLMs, you will always be able to generate a solution that is 10, 15% better, maybe 50% better to specific task, because you're kind of applying these base concepts.

And so people tell me like, Dex, this is all gonna get bitter, lessened. Then I'm like. I think that's how we get to a GI, I mean, SWIX said this too is like the way we get to a GI is we continually like ignore the bitter lesson and trying to make these things better. And that's how we learn what the next generation of model needs to do over and over again.

Corey: That is fractally weird, if that makes sense.

Dexter: It's a little weird. We'll see how it plays out. The cynical thing you could say is like, here we are engineers trying to make sense of this crazy new world that's moving so, so fast and trying to figure out how we can add value to a thing that's there.

And then retcon justifying of like, no, it's worth putting in this effort. 'cause the next models will be smarter, but I'll be able to make them even smarter over and over again until a GI.

Corey: If people wanna learn more about what you're up to and how you view the world, where's the best place to find you?

Dexter: If you want the cutting edge stuff, just follow me on Twitter. I'm @dexhorthy. And then, you know, we're building products in this space. You can go to humanlayer.dev. We will be launching soon. I know, I get. You can come hang out at our discord, but it's literally just a wall of angry people asking me like, when the heck are you gonna launch this thing?

We're kind of in private preview with a small group. We are looking forward to giving it to more people soon. But if you go to humanlayer.dev, you can sign up on the list, you'll get the launch announcements and you can uh, see some of the fun stuff we're hacking on,

Corey: and we put links to that in the show notes.

Next. Thank you so much for taking the time to speak with me. I appreciate it.

Dexter: This was a delightful journey around a bunch of places I did not expect to be talking about, but I had fun the whole way.

Corey: That's the entire point. Dex Horthy, CEO, and co-founder of Human Layer. I'm Cloud economist Corey Quinn, and this is Screaming In the Cloud.

If you've enjoyed this podcast, please leave a five star review on your podcast platform of choice. Whereas if you've hated this episode, please, we have a five star review on your podcast platform of choice, and then have your model write a dom comment on that platform, and then we'll just wait for a smarter model to come along that can dunk on you right back.

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Gnarly cloud cost questions?

Good news: we’ve got answers (and coffee). Meet the Duckbill team for personalized advice on your thorniest AWS challenges.