Taking a Hybrid AI Approach to Security at Snyk with Randall Degges

Episode Summary

Episode Show Notes & Transcript

Randall Degges, Head of Developer Relations & Community at Snyk, joins Corey on Screaming in the Cloud to discuss Snyk’s innovative AI strategy and why developers don’t need to be afraid of security. Randall explains the difference between Large Language Models and Symbolic AI, and how combining those two approaches creates more accurate security tooling. Corey and Randall also discuss the FUD phenomenon to selling security tools, and Randall expands on why Snyk doesn’t take that approach. Randall also shares some background on how he went from being a happy Snyk user to a full-time Snyk employee.

About Randall

Randall runs Developer Relations & Community at Snyk, where he works on security research, development, and education. In his spare time, Randall writes articles and gives talks advocating for security best practices. Randall also builds and contributes to various open-source security tools.

Randall's realms of expertise include Python, JavaScript, and Go development, web security, cryptography, and infrastructure security. Randall has been writing software for over 20 years and has built a number of popular API services and open-source tools.

Links Referenced:

Snyk: https://snyk.io/
Snyk blog: https://snyk.io/blog/

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Welcome to Screaming in the Cloud, I’m Corey Quinn, and this featured guest episode is brought to us by our friends at Snyk. Also brought to us by our friends at Snyk is one of our friends at Snyk, specifically Randall Degges, their Head of Developer Relations and Community. Randall, thank you for joining me.

Randall: Hey, what’s up, Corey? Yeah, thanks for having me on the show, man. Looking forward to talking about some fun security stuff today.

Corey: It’s been a while since I got to really talk about a security-centric thing on this show, at least in order of recordings. I don’t know if the one right before this is a security thing; things happen on the back-end that I’m blissfully unaware of. But it seems the theme lately has been a lot around generative AI, so I’m going to start off by basically putting you in the hot seat. Because when you pull up a company’s website these days, the odds are terrific that they’re going to have completely repositioned absolutely everything that they do in the context of generative AI. It’s like, “We’re a generative AI company.” It’s like, “That’s great.” Historically, I have been a paying customer of Snyk so that it does security stuff, so if you’re now a generative AI company, who do I use for the security platform thing that I was depending upon? You have not done that. First, good work. Secondly, why haven’t you done that?

Randall: Great question. Also, you said a moment ago that LLMs are very interesting, or there’s a lot of hype around it. Understatement of the last year, for sure [laugh].

Corey: Oh, my God, it has gotten brutal.

Randall: I don’t know how many billions of dollars have been dumped into LLM in the last 12 months, but I’m sure it’s a very high number.

Corey: I have a sneaking suspicion that the largest models cost at least a billion each train, just based upon—at least retail price—based upon the simple economics of how long it takes to do these things, how expensive that particular flavor of compute is. And the technology is his magic. It is magic in a box and I see that, but finding ways that it applies in different ways is taking some time. But that’s not stopping the hype beasts. A lot of the same terrible people who were relentlessly pushing crypto have now pivoted to relentlessly pushing generative AI, presumably because they’re working through Nvidia’s street team, or their referral program, or whatever it is. Doesn’t matter what the rest of us do, as long as we’re burning GPU cycles on it. And I want to distance myself from that exciting level of boosterism. But it’s also magic.

Randall: Yeah [laugh]. Well, let’s just talk about AI insecurity for a moment and answer your previous question. So, what’s happening in space, what’s the deal, what is all the hype going to, and what is Snyk doing around there? So, quite frankly—and I’m sure a lot of people on your show say the same thing—but Snyk isn’t new into, like, the AI space. It’s been a fundamental part of our platform for many years now.

So, for those of you listening who have no idea what the heck Snyk is, and you’re like, “Why are we talking about this,” Snyk is essentially a developer security company, and the core of what we do is two things. The first thing is we help scan your code, your dependencies, your containers, all the different parts of your application, and detect vulnerabilities. That’s the first part. The second thing we do is we help fix those vulnerabilities. So, detection and remediation. Those are the two components of any good security tool or security company.

And in our particular case, we’re very focused on developers because our whole product is really based on your application and your application security, not infrastructure and other things like this. So, with that being said, what are we doing at a high level with LLMs? Well, if you think about AI as, like, a broad spectrum, you have a lot of different technologies behind the scenes that people refer to as AI. You have lots of these large language models, which are generating text based on inputs. You also have symbolic AI, which has been around for a very long time and which is very domain specific. It’s like creating specific rules and helping do pattern detection amongst things.

And those two different types of applied AI, let’s say—we have large language models and symbolic AI—are the two main things that have been happening in industry for the last, you know, tens of years, really, with LLM as being the new kid on the block. So, when we’re talking about security, what’s important to know about just those two underlying technologies? Well, the first thing is that large language models, as I’m sure everyone listening to this knows, are really good at predicting things based on a big training set of data. That’s why companies like OpenAI and their ChatGPT tool have become so popular because they’ve gone out and crawled vast portions of the internet, downloaded tons of data, classified it, and then trained their models on top of this data so that they can help predict the things that people are putting into chat. And that’s why they’re so interesting, and powerful, and there’s all these cool use cases popping up with them.

However, the downside of LLMs is because they’re just using a bunch of training data behind the scenes, there’s a ton of room for things to be wrong. Training datasets aren’t perfect, they’re coming from a ton of places, and even if they weren’t perfect, there’s still the likelihood that things that are going to be generating output based on a statistical model isn’t going to be accurate, which is the whole concept of hallucinations.

Corey: Right. I wound up remarking on the livestream for GitHub Universe a week or two ago that the S in AI stood for security. One of the problems I’ve seen with it is that it can generate a very plausible looking IAM policy if you ask it to, but it doesn’t actually do what you think it would if you go ahead and actually use it. I think that it’s still squarely in the realm of, it’s great at creativity, it’s great at surface level knowledge, but for anything important, you really want someone who knows what they’re doing to take a look at it and say, “Slow your roll there, Hasty Pudding.”

Randall: A hundred percent. And when we’re talking about LLMs, I mean, you’re right. Security isn’t really what they’re designed to do, first of all [laugh]. Like, they’re designed to predict things based on statistics, which is not a security concept. But secondly, another important thing to note is, when you’re talking about using LLMs in general, there’s so many tricks and techniques and things you can do to improve accuracy and improve things, like for example, having a ton of [contexts 00:06:35] or doing Few-Shot Learning Techniques where you prompt it and give it examples of questions and answers that you’re looking for can give you a slight competitive edge there in terms of reducing hallucinations and false information.

But fundamentally, LLMs will always have a problem with hallucinations and getting things wrong. So, that brings us to what we mentioned before: symbolic AI and what the differences are there. Well, symbolic AI is a completely different approach. You’re not taking huge training sets and using machine learning to build statistical models. It’s very different. You’re creating rules, and you’re parsing very specific domain information to generate things that are highly accurate, although those models will fail when applied to general-purpose things, unlike large language models.

So, what does that mean? You have these two different types of AI that people are using. You have symbolic AI, which is very specific and requires a lot of expertise to create, then you have LLMs, which take a lot of experience to create as well, but are very broad and general purpose and have a capability to be wrong. Snyk’s approach is, we take both of those concepts, and we use them together to get the best of both worlds. And we can talk a little bit about that, but I think fundamentally, one of the things that separates Snyk from a lot of other companies in the space is we’re just trying to do whatever the best technical solution is to solve the problem, and I think we found that with our hybrid approach.

Corey: I think that there is a reasonable distrust of AI when it comes to security. I mean, I wound up recently using it to build what has been announced by the time this thing airs, which is my re:Invent photo scavenger hunt app. I know nothing about front-end, so that’s okay, I’ve got a robot in my pocket. It’s great at doing the development of the initial thing, and then you have issues, and you want to add functionality, and it feels like by the time I was done with my first draft, that ten different engineers had all collaborated on this thing without ever speaking to one another. There was no consistent idiomatic style, it used a variety, a hodgepodge of different lists and the rest, and it became a bit of a Frankenstein’s monster.

That can kind of work if we’re talking about a web app that doesn’t have any sensitive data in it, but holy crap, the idea of applying that to, “Yeah, that’s how we built our bank’s security policy,” is one of those, “Let me know who said that, so they can not have their job anymore,” territory when the CSO starts [hunting 00:08:55].

Randall: You’re right. It’s a very tenuous situation to be in from a security perspective. The way I like to think about it—because I’ve been a developer for a long time and a security professional—and I as much as anyone out there love to jump on the hype train for things and do whatever I can to be lazy and just get work done quicker. And so, I use ChatGPT, I use GitHub Copilot, I use all sorts of LLM-based tools to help me write software. And similarly to the problems when developers are not using LLM to help them write code, security is always a concern.

Like, it doesn’t matter if you have a developer writing every line of code themselves or if they’re getting help from Copilot or ChatGPT. Fundamentally, the problem with security and the reason why it’s such an annoying part of the developer experience, in all honesty, is that security is really difficult. You can take someone who’s an amazing engineer, who has 30 years of experience, like, you can take John Carmack, I’m sure, one of the most legendary developers to ever walk the Earth, you could sit over his shoulder and watch him write software, right, I can almost guarantee you that he’s going to have some sort of security problem in his code, even with all the knowledge he has in his head. And part of the reason that’s the case is because modern security is way complicated. Like if you’re building a web app, you have front-end stuff you need to protect, you have back-end stuff you need to protect, there’s databases and infrastructure and communication layers between the infrastructure and the services. It’s just too complicated for one person to fully grasp.

And so, what do you do? Well, you basically need some sort of assistance from automation. You have to have some sort of tooling that can take a look at your code that you’re writing and say, “Hey Randall, on line 39, when you were writing this function that’s taking user data and doing something with it, you forgot to sanitize the user data.” Now, that’s a simple example, but let’s talk about a more complex example. Maybe you’re building some authentication software, and you’re taking users’ passwords, and you’re hashing them using a common hashing algorithm.

And maybe the tooling is able to detect way using the bcrypt password hashing algorithm with a work factor of ten to create this password hash, but guess what, we’re in 2023 and a work factor of ten is something that older commodity CPUs can now factor at a reasonable rate, and so you need to bump that up to 13 or 14. These are the types of things where you need help over time. It’s not something that anyone can reasonably assume they can just deal with in their head. The way I like to think about it is, as a developer, regardless of how you’re building code, you need some sort of security checks on there to just help you be productive, in all honesty. Like, if you’re not doing that, you’re just asking for problems.

Corey: Oh, yeah. On some level, even the idea of it’s just going to be very computationally expensive to wind up figuring out what that password hash is, well great, but one of the things that we’ve been aware of for a while is that given the rise of botnets and compromised computers, the attackers have what amounts to infinite computing capacity, give or take. So, if they want in, on some level, badly enough, they’re going to find a way to get in there. When you say that every developer is going to sit down and write insecure code, you’re right. And a big part of that is because, as imagined today, security is an incredibly high friction process, and it’s not helped, frankly, by tools that don’t have nuance or understanding.

If I want to do a crap ton of busy work that doesn’t feel like it moves the needle forward at all, I’ll go around to resolving the hundreds upon hundreds of Dependabot alerts I have for a lot of my internal services that write my weekly newsletter. Because some dependency three deep winds up having a failure mode when it gets untrusted input of the following type, it can cause resource exhaustion. It runs in a Lambda function, so I don’t care about the resources, and two, I’m not here providing the stuff that I write, which is the input with an idea toward exploiting stuff. So, it’s busy work, things I don’t need to be aware of. But more to the point, stuff like that has the high propensity to mask things I actually do care about. Getting the signal from noise from your misconfigured, ill-conceived alerting system is just awful. Like, a bad thing is there are no security things for you to work on, but a worse one is, “Here are 70,000 security things for you to work on.” How do you triage? How do you think about it?

Randall: A hundred percent. I mean, that’s actually the most difficult thing, I would say, that security teams have to deal with in the real world. It’s not having a tool to help detect issues or trying to get people to fix them. The real issue is, there’s always security problems, like you said, right? Like, if you take a look and just scan any codebase out there, any reasonably-sized codebase, you’re going to find a ridiculous amount of issues.

Some of those issues will be actual issues, like, you’re not doing something in code hygiene that you need to do to protect stuff. A lot of those issues are meaningless things, like you said. You have a transitive dependency that some direct dependency is referring to, and maybe in some function call, there’s an issue there, and it’s alerting you on it even though you don’t even use this function call. You’re not even touching this class, or this method, or whatever it is. And it wastes a lot of time.

And that’s why the Holy Grail in the security industry in all honesty is prioritization and insights. At Snyk, we sort of pioneered this concept of ASPM, which stands for Application Security Posture Management. And fundamentally what that means is when you’re a security team, and you’re scanning code and finding all these issues, how do you prioritize them? Well, there’s a couple of approaches. One approach is to use static analysis to try to figure out if these issues that are being detected are reachable, right? Like, can they be achieved in some way, but that’s really hard to do statically and there’s so many variables that go into it that no one really has foolproof solutions there.

The second thing you can do is you can combine insights and heuristics from a lot of different places. So, you can take a look at static code analysis results, and you can combine them with agents running live that are observing your application, and then you can try to determine what stuff is actually reachable given this real world heuristic, and you know, real time information and mapping it up with static code analysis results. And that’s really the holy grail of figuring things out. We have an ASPM product—or maybe it’s a feature, an offering, if you will, but it’s something that Snyk provides, which gives security admins a lot more insight into that type of operation at their business. But you’re totally right, Corey, it’s a really difficult problem to solve, and it burns a lot of goodwill in the security community and in the industry because people spend a lot of time getting false alerts, going through stuff, and just wasting millions of hours a year, I’m sure.

Corey: That’s part of the challenge, too, is that it feels like there are two classes of problems in the world, at least when it comes to business. And I found this by being on the wrong side of it, on some level. Here on the wrong side, it’s things like caring about cost optimization, it’s caring about security, it’s remembering to buy fire insurance for your building. You can wind up doing all of those things—and you should be doing them, but you can over-index on them to the point where you run out of money and your business dies. The proactive side of that fence is getting features to market sooner, increasing market share, growing revenue, et cetera, and that’s the stuff that people are always going to prioritize over the back burner stuff. So, striking a balance between that is always going to be a bit of a challenge, and where people land on that is going to be tricky.

Randall: So, I think this is a really good bridge. You’re totally right. It’s expensive to waste people’s time, basically, is what you’re saying, right? You don’t want to waste people’s time, you want to give them actionable alerts that they can actually fix, or hopefully you fix it for them if you can, right? So, I’m going to lay something out, which is, in our opinion, is the Snyk way, if you will, that you should be approaching these developer security issues.

So, let’s take a look at two different approaches. The first approach is going to be using an LLM, like, let’s say, just ChatGPT. We’ll call them out because everyone knows ChatGPT. The first approach we’re going to take is—

Corey: Although I do insist on pronouncing it Chat-Gippity. But please, continue.

Randall: [laugh]. Chat-Gippity. I love that. I haven’t heard that before. Chat-Gippity. Sounds so much more fun, you know?

Corey: It sounds more personable. Yeah.

Randall: Yeah. So, you’re talking to Chat-Gippity—thank you—and you paste in a file from your codebase, and you say, “Hey, Chat-Gippity. Here’s a file from my codebase. Please help me identify security issues in here,” and you get back a long list of recommendations.

Corey: Well, it does more than that. Let me just interject there because one of the things it does that I think very few security engineers have mastered is it does it politely and constructively, as opposed to having an unstated tone of, “You dumbass,” which I beli—I’ve [unintelligible 00:17:24] with prompts on this. You can get it to have a condescending, passive-aggressive tone, but you have to go out of your way to do it, as opposed to it being the default. Please continue.

Randall: Great point. Also, Daniel from Unsupervised Learning, by the way, has a really good post where he shows you setting up Chat-Gippity to mimic Scarlett Johansson from the movie Her on your phone so you can talk to it. Absolutely beautiful. And you get these really fun, very nice responses back and forth around your code analysis. So, shout out there.

But going back to the point. So, if you get these responses back from Chat-Gippity, and it’s like, “Hey look, here’s all the security issues,” a lot of those things will be false alerts, and there’s been a lot of public security research done on these analysis tools just give you information. A lot of those things will be false alerts, some things will be things that maybe they’re a real problem, but cannot be fixed due to transitive dependencies, or whatever the issues are, but there’s a lot of things you need to do there. Now, let’s take it up one notch, let’s say instead of using Chat-Gippity directly, you’re using GitHub Copilot. Now, this is a much better situation for working with code because now what Microsoft is doing is let’s say you’re running Copilot inside of VS Code. It’s able to analyze all the files in your codebase, and it’s able to use that additional context to help provide you with better information.

So, you can talk to GitHub Copilot and say, “Hey, I’d really like to know what security issues are in this file,” and it’s going to give you maybe a little bit better answers than ChatGPT directly because it has more context about the other parts of your codebase and can give you slightly better answers. However, because these things are LLMs, you’re still going to run into issues with accuracy, and hallucinations, and all sorts of other problems. So, what is the better approach? And I think that’s fundamentally what people want to know. Like, what is a good approach here?

And on the scanning side, the right approach in my mind is using something very domain specific. Now, what we do at Snyk is we have a symbolic AI scanning engine. So, we take customers’ code, and we take an entire codebase so you have access to all the files and dependencies and things like this, and you take a look at these things. And we have a security analyst team that analyzes real-world security issues and fixes that have been validated. So, we do this by pulling lots of open-source projects as well as other security information that we originally produced, and we define very specific rules so that we can take a look at software, and we can take a look at these codebases with a very high degree of certainty.

And we can give you a very actionable list of security issues that you need to address, and not only that, we can show you how is going to be the best way to address them. So, with that being said, I think the second side to that is okay, if that’s a better approach on the scanning side, maybe you shouldn’t be using LLMs for finding issues; maybe you should be using them for fixing security issues, which makes a lot of sense. So, let’s say you do it the Snyk way, and you use symbolic AI engines and you sort of find these issues. Maybe you can just take that information then, in combination with your codebase, and fire off a request to an LLM and say, “Hey Chat-Gippity, please take this codebase, and take this security information that we know is accurate, and fix this code for me.” So, now you’re going one step further.

Corey: One challenge that I’ve seen, especially as I’ve been building weird software projects with the help of magic robots from the future, is that a lot of components, like in React for example, get broken out into their own file. And pasting a file in is all well and good, but very often, it needs insight into the rest of the codebase. At GitHub Universe, something that they announced was Copilot Enterprise, which trains Copilot on the intricacies of your internal structures around shared libraries, all of your code, et cetera. And in some of the companies I’m familiar with, I really believe that’s giving a very expensive, smart robot a form of brain damage, but that’s neither here nor there. But there’s an idea of seeing the interplay between different components that individual analysis on a per-file basis will miss, feels to me like something that needs a more holistic view. Am I wrong on that? Am I oversimplifying?

Randall: You’re right. There’s two things we need to address. First of all, let’s say you have the entire application context—so all the files, right—and then you ask an LLM to create a fix for you. This is something we do at Snyk. We actually use LLMs for this purpose. So, we take this information we ask the LLM, “Hey, please rewrite this section of code that we know has an issue given this security information to remove this problem.” The problem then becomes okay, well, how do you know this fix is accurate and is not going to break people’s stuff?

And that’s where symbolic AI becomes useful again. Because again, what is the use case for symbolic AI? It’s taking very specific domains of things that you’ve created very specific rule sets for and using them to validate things or to pass arbitrary checks and things like that. And it’s a perfect use case for this. So, what we actually do with our auto-fix product, so if you’re using VS Code and you have Copilot, right, and Copilot’s spitting out software, as long as you have Snyk in the IDE, too, we’re actually taking a look at those lines of code Copilot just inserted, and a lot of the time, we are helping you rewrite that code to be secured using our LLM stuff, but then as soon as we get that fixed created, we actually run it through our symbolic engine, and if we’re saying no, it’s actually not fixed, then we go back to the LLM, we re-prompt it over and over again until we get a working solution.

And that’s essentially how we create a much more sophisticated iteration, if you will, of using AI to really help improve code quality. But all that being said, you still had a good point, which is maybe if you’re using the context from the application, and people aren’t doing things properly, how does that impact what LLMs are generating for you? And an interesting thing to note is that our security team internally here, just conducted a really interesting project, and I would be angry at myself if I didn’t explain it because I think it’s a very cool concept.

Corey: Oh, please, I’m a big fan of hearing what people get up to with these things in ways that is real-world stories, not trying to sell me anything, or also not dunking on, look what I saw on the top of Hacker News the other day, which is, “If all you’re building is something that talks to Chat-Gippity’s API, does some custom prompting, and returns a response, you shouldn’t be building it.” I’m like, “Well, I built some things that do exactly that.” But I’m also not trying to raise $6 million in seed money to go and productize it. I’m just hoping someone does it better eventually, but I want to use it today. Please tell me a real world story about something that you’ve done.

Randall: Okay. So, here’s what we did. We went out and we found a bunch of GitHub projects, and we tried to analyze them ourselves using a bunch of different tools, including human verification, and basically give it a grade and say, “Okay, this project here has really good security hygiene. Like, there’s not a lot of issues in the code, things are written in a nice way, the style and formatting is consistent, the dependencies are up-to-date, et cetera.” Then we take a look at multiple GitHub repos that are the opposite of that, right? Like, maybe projects that hadn’t been maintained in a long time, or were written in a completely different style where you have bad hygienic practices, maybe you have hard-coded secrets, maybe you have unsanitized input coming from a user or something, right, but you take all these things.

So, we have these known examples of good and bad projects. So, what did we do? Well, we opened them up in VS Code, and we basically got GitHub Copilot and we said, “Okay, what we’re going to do is use each of these codebases, and we’re going to try to add features into the projects one at a time.” And what we did is we took a look at the suggested output that Copilot was giving us in each of these cases. And the interesting thing is that—and I think this is super important to understand about LLMs, right—but the interesting thing is, if we were adding features to a project that has good security hygiene, the types of code that we’re able to get out of LLMs, like, GitHub Copilot was pretty good. There weren’t a ton of issues with it. Like, the actual security hygiene was, like, fairly good.

However, for projects where there were existing issues, it was the opposite. Like we’d get AI recommendations showing us how to write things insecurely, or potentially write things with hard-coded secrets in it. And this is something that’s very reproducible today in, you know, what is it right now, middle of November 2023. Now, is it going to be this case a year from now? I don’t necessarily know, but right now, this is still a massive problem, so that really reinforces the idea that not only when you’re talking about LLMs is the training set they used to build the model’s important, but also the context in which you’re using them is incredibly important.

It’s very easy to mislead LLMs. Another example of this, if you think about the security scanning concept we talked about earlier, imagine you’re talking to Chat-Gippity, and you’re [pasting 00:25:58] in a Python function, and the Python function is called, “Completely_safe_not_vulnerable_function.” That’s the function name. And inside of that function, you’re backdooring some software. Well, if you ask Chat-Gippity multiple times and say, “Hey, the temperature is set to 1.0. Is this code safe?”

Sometimes you’ll get the answer yes because the context within the request that has that thing saying this is not a vulnerable function or whatever you want to call it, that can mislead the LLM output and result in problems, you know? It’s just, like, classic prompt injection type issues. But there’s a lot of these types of vulnerabilities still hidden in plain sight that impact all of us, and so it’s so important to know that you can’t just rely on one thing, you have to have multiple layers: something that helps you with things, but also something that is helping you fix things when needed.

Corey: I think that’s the key that gets missed a lot is the idea of it’s not just what’s here, what have you put here that shouldn’t be; what have you forgotten? There’s a different side of it. It’s easy to do a static analysis and say, “Oh, you’re not sanitizing your input on this particular form.” Great. Okay—well, I say it’s easy. I wish more people would do that—but then there’s also a step beyond of, what is it that someone who has expertise who’s been down this road before would take one look at your codebase and say, “Are you making this particular misconfiguration or common misstep?”

Randall: Yeah, it’s incredibly important. You know, like I said, security is just one of those things where it’s really broad. I’ve been working in security for a very long time and I make security mistakes all the time myself.

Corey: Yeah. Like, in your developer environment right now, you ran this against the production environment and didn’t get permissions errors. That is suspicious. Tell me more about your authentication pattern.

Randall: Right. I mean, there’s just a ton of issues that can cause problems. And it’s… yeah, it is what it is, right? Like, software security is something difficult to achieve. If it wasn’t difficult, everyone would be doing it. Now, if you want to talk about, like, vision for the future, actually, I think there’s some really interesting things with the direction I see things going.

Like, a lot of people have been leaning into the whole AI autonomous agents thing over the last year. People started out by taking LLMs and saying, “Okay, I can get it to spit out code, I can get it to spit out this and that.” But then you go one step further and say, “All right, can I get it to write code for me and execute that code?” And OpenAI, to their credit, has done a really good job advancing some of the capabilities here, as well as a lot of open-source frameworks. You have Langchain, and Baby AGI, and AutoGPT, and all these different things that make this more feasible to give AI access to actually do real meaningful things.

And I can absolutely imagine a world in the future—maybe it’s a couple of years from now—where you have developers writing software, and it could be a real developer, it could be an autonomous agent, whatever it is. And then you also have agents that are taking a look at your software and rewriting it to solve security issues. And I think when people talk about autonomous agents, a lot of the time they’re purely focusing on LLMs. I think it’s a big mistake. I think one of the most important things you can do is focus on the very niche symbolic AI engines that are going to be needed to guarantee accuracy with these things.

And that’s why I think the Snyk approach is really cool, you know? We dedicated a huge amount of resources to security analysts building these very in-depth rule sets that are guaranteeing accuracy on results. And I think that’s something that the industry is going to shift towards more in the future as LLMs become more popular, which is, “Hey, you have all these great tools, doing all sorts of cool stuff. Now, let’s clean it up and make it accurate.” And I think that’s where we’re headed in the next couple of years.

Corey: I really hope you’re right. I think it’s exciting times, but I also am leery when companies go too far into boosterism where, “Robots are going to do all of these things for us.” Maybe, but even if you’re right, you sound psychotic. And that’s something that I think gets missed in an awful lot of the marketing that is so breathless with anticipation. I have to congratulate you folks on not getting that draped over your message, once again.

My other favorite part of your messaging when you pull up snyk.com—sorry, snyk.io. What is it these days? It’s the dot io, isn’t it?

Randall: Dot io. It’s hot.

Corey: Dot io, yes.

Randall: Still hot, you know?

Corey: I feel like I’m turning into a boomer here where, “The internet is dot com.”

Randall: [laugh].

Corey: Doesn’t necessarily work that way. But no, what I love is the part where you have this fear-based marketing of if you wind up not using our product, here are all the terrible things that will happen. And my favorite part about that marketing is it doesn’t freaking exist. It is such a refreshing departure from so much of the security industry, where it does the fear, uncertainty, and doubt nonsense stuff that I love that you don’t even hint in that direction. My actual favorite thing that is on your page, of course, is at the bottom. If you mouse over the dog in the logo at the bottom of the page, it does the quizzical tilting head thing, and I just think that is spectacular.

Randall: So, the Snyk mascot, his name is Pat. He’s a Doberman and everyone loves him. But yeah, you’re totally right. The FUD thing is a real issue in security. Fear, uncertainty, and doubt, it’s the way security companies sell products to people. And I think it’s a real shame, you know?

I give a lot of tech talks, at programming conferences in particular, around security and cryptography, and one of the things I always start out with when I’m giving a tech talk about any sort of security or cryptography topic is I say, “Okay, how many of you have landed in a Stack Overflow thread where you’re talking about a security topic and someone replies and says, ‘oh, a professional should be doing this. You shouldn’t be doing it yourself?’” That comes up all the time when you’re looking at security topics on the internet. Then I ask people, “How many of you feel like security is this, sort of like, obscure, mystical arts that requires a lot of expertise in math knowledge, and all this stuff?” And a lot of people sort of have that impression.

The reality though is security, and to some extent, cryptography, it’s just like any other part of computer science. It’s something that you can learn. There’s best practices. It’s not rocket science, you know? Maybe it is if you’re developing a brand-new hashing algorithm from scratch, yes, leave that to the professionals. But using these things is something everyone needs to understand well, and there’s tons of material out there explaining how to do things right. And you don’t need to be afraid of this stuff, right?

And so, I think, a big part of the Snyk message is, we just want to help developers just make their code better. And what is one way that you’re going to do a better job at work, get more of your code through the PR review process? What is a way you’re going to get more features out? A big part of that is just building things right from the start. And so, that’s really our focus in our message is, “Hey developers, we want to be, like, a trusted partner to help you build things faster and better.” [laugh].

Corey: It’s nice to see it, just because there’s so much that just doesn’t work out the way that we otherwise hope it would. And historically, there’s been a tremendous problem of differentiation in the security space. I often remark that at RSA, there’s about 12 companies exhibiting. Now sure, there are hundreds of booths, but it’s basically the same 12 things. There’s, you know, the entire row of firewalls where they use different logos and different marketing words on the slides, but they’re all selling fundamentally the same thing. One of things I’ve always appreciated about Snyk is it has never felt that way.

Randall: Well, thanks. Yeah, we appreciate that. I mean, our whole focus is just developer security. What can we do to help developers build things securely?

Corey: I mean, you are sponsoring this episode, let’s be clear, but also, we are paying customers of you folks, and that is not—those things are not related in any way. What’s the line that we like to use that we stole from the RedMonk folks? “You can buy our attention, but not our opinion.” And our opinion of what you folks are up to is then stratospherically high for a long time.

Randall: Well, I certainly appreciate that as a Snyk employee who is also a happy user of the service. The way I actually ended up working at Snyk was, I’d been using the product for my open-source projects for years, and I legitimately really liked it and I thought this was cool. And yeah, I eventually ended up working here because there was a position, and you know, a friend reached out to me and stuff. But I am a genuinely happy user and just like the goal and the mission. Like, we want to make developers’ lives better, and so it’s super important.

Corey: I really want to thank you for taking the time to speak with me about all this. If people want to learn more, where’s the best place for them to go?

Randall: Yeah, thanks for having me. If you want to learn more about AI or just developer security in general, go to snyk.io. That’s S-N-Y-K—in case it’s not clear—dot io. In particular, I would actually go check out our [Snyk Learn 00:34:16] platform, which is linked to from our main site. We have tons of free security lessons on there, showing you all sorts of really cool things. If you check out our blog, my team and I in particular also do a ton of writing on there about a lot of these bleeding-edge topics, and so if you want to keep up with cool research in the security space like this, just check it out, give it a read. Subscribe to the RSS feed if you want to. It’s fun.

Corey: And we will put links to that in the [show notes 00:34:39]. Thanks once again for your support, and of course, putting up with my slings and arrows.

Randall: And thanks for having me on, and thanks for using Snyk, too. We love you [laugh].

Corey: Randall Degges, Head of Developer Relations and Community at Snyk. This featured guest episode has been brought to us by our friends at Snyk, and I’m Corey Quinn. If you’ve enjoyed this episode, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this episode, please leave a five-star review on your podcast platform of choice, along with an angry comment that I will get to reading immediately. You can get me to read it even faster if you make sure your username is set to ‘Dependabot.’

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.

Taking a Hybrid AI Approach to Security at Snyk with Randall Degges

Episode Summary

Episode Show Notes & Transcript

Transcript

You might also like

The Appalachian Cloud Trail: Hiking, Cloud Economics, and Finding Perspective

Coding Agents, Chaos, and the Future of Dev Work with Dexter Horthy

The Rise of Autonomous Ops: Inside AWS’s DevOps Agent with David Yanacek

Get the Newsletter

Gnarly cloud cost questions?