The Demystification of Zero Trust with Philip Griffiths

Episode Summary

Networking in AWS has gotten more and more capable over the years, but with it comes considerable complexity. Phillip Griffiths, Head of Business Development at NetFoundry, where they are trying to take a differentiated approach to the complexity of networking and AWS, has a thing or two to say about it. Phillip and Corey tackle one of the most important things in the network stack: security. In that regard, as a network overlay NetFoundry has to start at the application level. Phillip dissects his “tiers” for zero trust and how he quantifies what zero trust actually is. With some help from a boy wizard and a conversation with his daughter, Phillip is able to reveal the magic behind the zero trust hat trick. Check out this in-depth conversation for more!

Episode Show Notes & Transcript

About Philip
Philip Griffiths is VP Global Business Development and regularly speaks at events from DevOps to IoT to Cyber Security. Prior to this, he worked for Atos IT Services in various roles working with C-suit executives to realise their digital transformation. He lives in Cambridge with his wife and two daughters.

Links:

Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: Today’s episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that’s built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you’re defining those as, which depends probably on where you work. It’s getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that’s exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn’t eat all the data you’ve gotten on the system, it’s exactly what you’ve been looking for. Check it out today at min.io/download, and see for yourself. That’s min.io/download, and be sure to tell them that I sent you.


Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Today’s promoted episode is about a topic that is near and dear to my heart. In the AWS universe, we have seen over time that the networking has gotten more and more capable going from EC2 Classic to the world of VPC network to a whole bunch of other things. But with that capability comes a stupendous amount of complexity, to the point where the easy answer to, “Do you understand how networking works within AWS?” Is, of course, no, “I don’t.”


I’m joined today by Philip Griffiths, who’s the Head of Business Development at NetFoundry. Philip, thank you for joining me.


Philip: Pleasure to be here, Corey.


Corey: So, NetFoundry has what I would argue to be one of the most intriguing-slash-differentiated approaches to handling that ever-increasing complexity around the networking story, not just in AWS, but a number of different cloud providers, and between them, and that approach is to ignore it completely. Have I nailed the salient approach here with that, I guess we’ll call it a flippant statement.


Philip: Yeah, I’d probably say so. It’s the interesting thing where a lot of people say cloud networking is hard, and from our perspective, it should just be super easy, you should be able to provision it in a few minutes with only outbound ports, and set up your policy so that malicious actors can’t get inside it. It should be that easy, and programmable, and it’s a shame that the current world is not.


Corey: One of the hard problems has always been in, I guess, security, which is the thing that everyone pretends to care about right up front, but in practice, often winds up bolting it on after the fact because, “We care about security,” is sort of the trademark phrase of things that we see, usually an email announcing a data breach when it was very clear that companies did not care about security. It’s not just me complaining about how complex the network stack is, but by what directly flows from that. If you aren’t able to fit all of that into your head as far as what’s going on from a security perspective, the odds of misconfiguration creep in and you don’t really become aware of what your risk exposure is. I’m really partial to the idea of just avoiding it entirely. Is NetFoundry, effectively, a network overlay? Is it something that goes a bit beyond that? Effectively, where do you folks start and where do you stop?


Philip: Yes, that is precisely correct. We are a network overlay that’s been built on the principles of zero trust. What is very unique is the ability to be able to start it wherever you want. So yes, you can deploy it from the AWS Marketplace in a few minutes into your VPC or into your operating system, but we also have the ability to actually put it directly into the application stack itself, which has some very interesting complications. What I find as the most interesting starting point is the oxymoron of secure networking.


There are no secure networks. It’s not possible. Networks are designed to share information and taking it to first principles, you can only isolate networks. And this is why we had the thought process for if we’re going to put our overlay network into stuff and make it secure, we have to start at the application level because then we can actually just isolate it to an application communicating into an application, which has profound implications.


Corey: The network part is relatively straightforward. I imagine it just becomes, more or less, what resembles a fairly flat network where everything internal is allowed to talk to each other, and then, in turn, this winds up effectively elevating what should be allowed to talk to what and on what ports and whatnot into something that’s a lot closer to the application logic, and transcends whatever provider it happens to be traversing.


Philip: Yeah, correct. Following the principles of zero trust, we utilize strong embedded identity as a function of what the endpoints are, what the source and destination is. And therefore you build up your policies and services to say what should communicate to what on the basis that the default the least privileged: Absolutely nothing. Your underlay then, the only thing you need is commodity internet with outbound ports. The whole concept of north-south, east-west, if you’re app-embedded, you don’t even need public DNS; you don’t even need DNS at all. Naming conventions go out the window; you don’t need to conform to the standards. You know, you could say, “I want to hit Jenkins.” You go to Jenkins because that can be done.


Corey: I would approach this entire endeavor with a fair bit of suspicion and no small amount of alarm if it were something that you had developed internally, as far as, “Well, we’re just going to replace what amounts to your entire network stack and just go ahead and trust us. It’s fine.” But you didn’t do that. You’re riding on top of the OpenZiti open-source project. And that basically assuages a whole raft of concerns I would have if something like this were proprietary, and people who know what they’re doing—who, let’s be clear, aren’t me—were not able to inspect it and say, “Okay, this passes muster”—as they have done—or alternately, “No, this is terrifyingly dangerous for a variety of excellent reasons.”


And it really feels like a lot of the zero-trust stories that we see these days that are taking advantage of either a network overlay approach or shifting authentication into a different layer, have all taken a somewhat similar tack. I used to think it was a good idea; now I’m starting to suspect it might very well be the only viable model. Do you find that that’s accurate, or was this a subject of some contention when you were starting out?


Philip: So, there’s two very interesting [sigh] thoughts that came to me as you were saying that. The number one is yes, we drove forward with OpenZiti because we’ve seen open-source just completely dominate the industry and everything new that’s been built. If you want to deploy an application, you’re building on Linux. And in fact, you’re probably [laugh] also running on Kubernetes if you’re building new. And our objective was to be able to turn OpenZiti into you know, the open-source, zero-trust private network and equivalent where it’s just standard: You’ll bake your application with Ziti, by design.


It will become a check function that people say you have to comply to. When I look at other vendors and how they look at zero-trust, I broadly see a few things that dishearten me. And again, it’s a big market, a lot of people—everyone says they’re zero-trust nowadays—but I broadly categorize it into a few ways. You have people who are effectively acting as a proxy and they’re adding authentication as a way to check what people should have access to. And they may give access to the whole network, they may do granular; it varies between them. In fact, I’ve just written a blog on this where I effectively call that no-magic zero trust. It’s a blog conceptualized within Harry Potter and [unintelligible 00:07:36] a conversation with my daughter.


Corey: Yeah, any way to tell a story that beats the traditional enterprise voice is very much appreciated over in this corner of the world.


Philip: [laugh]. Yeah, exactly. You have a second tier, which is what I like to think as semi-magical. And that’s where you start saying, I am going to use a software-defined perimeter. So, that it’s first packet authenticate, or outbound-only based upon embedded identity. And in my eyes, this is basically an invisibility cloak.


You then have app-embedded or magical zero-trust. And this is where you’re putting the invisibility cloak inside your application, but you’re also giving it a port key so that when it needs to connect to something else on the other side of the world, it just happens; it’s transparent. And broadly speaking, I think it’s very good that the whole world, including the US government is taking zero-trust incredibly importantly, but the distribution of how people tackle a problem is wildly different. There are some zero-trust solutions, which going in the right direction, but fundamentally, if you’re putting it in front of your—
I won’t name a vendor, but there was a vendor who in December, they released a report 
that said in 90 seconds, common vulnerabilities are exploited something like 96% of the time. 24-hours, 100%.


A few days later, they had a 9.8 CVE on their zero-trust VPN concentrator with a public IP, to which I thought, “If you’re not patching that immediately, you’ve got problems if someone is coming into your network.”


Corey: Absolutely. We just completed our annual security awareness training here, and so much of it just… it really made my skin crawl, there was an entire module on how to effectively detect phishing emails, and I got to tell you, if they ever start running spellcheck on their some of their [spear-phishing 00:09:23] campaigns, then we’re all doomed because that was what the entire training was here. My position is, is that okay, if someone in your company clicks a bad link and it destroys the company’s infrastructure, maybe it’s the person who’s clicking the link that is not necessarily the critical failure point here. Great, if someone compromises an employee workstation, there should be a way to contain the blast radius, they should not now be inside the walls and able to traverse into whatever it is that they want. There should be additional barriers, and zero trust—though it has become, as you say, a catch-all term—seems to be a serious way of looking at this type of compromise and this sort of mitigation against that sort of behavior.


Philip: Definitely. And I think that leads itself to, if you’re using the correct zero-trust solution, you’re able to close [unintelligible 00:10:12] ports, great, you’ve now massively reduced your attack surface. But what if someone does get a phishing injection of ransomware or something to their endpoint or into their servers? The two things that I like to think about is that if you’re creating your overlay network so that the only communication from your server is outbound into the public IPs of your private overlay, then effectively even if the ransomware gets in there, it can’t then connect to its command and control module to then go through the kill cycle to other activities. The other is that if you then look at it [instead 00:10:46] of on the server-side, but actually on the client-side, if someone infects my Mac laptop with ransomware, we use this internal application called Mattermost.


And it’s basically Slack, but open-source. If my Mattermost is Ziti-fied, even I’ve got ransomware on my device, it can’t side-channel attack into Mattermost because you would actually have to break into the Mattermost application and somehow get that Mattermost application to make a compromised query or whatever to get past the system. So really, when I look at zero-trust, it’s not about saying, “We’re secure. Job done. You know, fire the security department because we don’t need them anymore.” It’s all about saying—


Corey: Box check. Hand it off to the auditor.


Philip: [laugh]. Exactly. It’s more about saying the cost of attack, the cost of compromised is increased, ideally, to the point where the malicious actors don’t have a return on investment. Because if they don’t have a return on investment, they will find something else that’s not your applications and your systems to try and compromise.


Corey: I want to make sure that I’m contextualizing this properly because we’re talking—I think—about what almost looks like two different worlds here. There’s the, this is how things wind up working in the ecosystem as far as your server environment goes in a cloud provider, but then we’re also talking about what goes on in your corporate network of people who are using laptops, which is increasingly being done from home these days. Where do you folks start? Where do you stop? Do you transcend into the corporate network as well, or is this primarily viewed as a production utility?


Philip: We do. One of our original design principles with OpenZiti was for it to be a platform rather than a point solution. So, we designed it from the ground up to be able to support any IP packets, TCP, UDP, et cetera, whether you’re doing, client-server, server-server, machine-server, server-initiated, client-initiated, yadda, yadda, yadda. So effectively, the same technology can be applied to many different use cases, depending on where you want to use it. We’ve been doing work recently to handle, let’s call them the hard use cases.


Probably one of the hardest ones out there is VoIP. There is a playbook that is currently taking place where the VoIP-managed service provider gets DDoSed by malicious actors; the playbook is to move it onto a CDN so that you move the attack surface and you get respite for a few hours. And there’s not really any way to solve it because blocking DDoS attacks at layer 3, layer 4 is incredibly difficult unless you can make your PBX dark. And I’ve seen a couple of our OpenZitiE engineers making calls from one device to another without going through the PBX by doing that over OpenZiti, and being able to solve some of the challenges that’s normally associated with VoIP. Again, it was really one of our design principles: How can we make the platform is so flexible that we can do X, Y, Zed today; we’re able to build it, again to become a standard, because it can handle anything.


Corey: One of the big questions that people are going to have going into this is, and this may sound surprising is a little bit less about technical risk of things like encryption and the rest and a lot more around the idea of okay, does this mean that what you are building becomes a central point of business risk? In other words, if the NetFoundry SaaS installation and wherever they happen to be using as their primary winds up going down, does that mean suddenly nothing can talk to one another? Because it turns out that, you know, computers are not particularly useful in 2022 if they aren’t able to talk to other computers, by and large. “The network is the computer,” as was famously stated. What is the failure mode in the event that you experience technical interruption?


Philip: We have this internal sessions, which we call Ziti Kitchens, where our engineering team that are creating Ziti educate on stuff that they’re building. And one of them in the Ziti Kitchen was around HA, HS, et cetera, and all of the functions that we’ve built in so that you have redundancy and availability within the different components. Because effectively it’s an overlay network, so we’ve designed it to be a mesh overlay network. You can setup with one point of failure, but then simultaneously, you can very easily set up to have no points of failure because it can have that redundancy and the overlay has its own mechanisms to do things like smart routing and calculation of underlying costs.


That cost in that instance would be, well, AWS has gone down, so the latency to send a packet or flow over it is incredibly high, therefore I’m going to avoid that route and send the traffic to another location. I always remember this Ziti Kitchen episode because the underlying technology that does it is called Terminators—Ziti has these things called Terminators—some of the slide there was this little heads over the Terminator with the red eyes, you know, the silver exoskeleton, which always made me laugh.


Corey: It’s helpful to have things that fail out of band as opposed to—think of the traditional history in security before everything was branded with zero-trust as a prerequisite for exhibiting at RSA; before that was firewalls was the story, and the question always was, if a firewall fails, do you want it to fail open or fail closed? And believe it or not, there are legitimate answers in both directions; depends on context and what you’re doing. There are some things for example, IAM in a cloud world where you absolutely never want to fail open, full stop. You would rather someone bodily rip the power cable out the back of the data center rather than let that happen. With something like this, where nothing is able to talk to one another if the entire system goes down, yeah, you want to have the control system that you folks run to be out of band, that is almost always the right answer.


As I look at the various case studies that you have on your website and the serious companies that are using what you have built, do you find that they are primarily centralizing around individual cloud providers? Are you seeing that they’re using this as a expression of multi-cloud because I can definitely see a story where oh, it helps bring two cloud providers from a networking and security perspective onto the same page, but I can also see, even within one cloud provider, the idea that, hey, I don’t have to play around with your ridiculous nonsense? What use cases are you seeing emerge among 
your customers?


Philip: Definitely, the multi-cloud challenge is one that we’re seeing as a emerging trend. We do a lot of work with Oracle and, you know, their stated position is multi-cloud is a fact. In fact for them, if we make the secure networking easier, we can bring workloads into our cloud quicker [unintelligible 00:17:21] the main driver between our partnership. We recently did a blog talking about Superclouds and the advent of organizations like Snowflake and HashiCorp and Confluence and Databricks basically building value and business applications which abstracts away the underlying complexity. But you get into the problem of the standard shared security model, where the customer has to deal with DNS and VPNs and MPLS and AWS Private Endpoint or Azure Private Link or whatever they call it, and you have to assemble this Frankenstein of stuff just to enable a VM to communicate to another VM.


And the posit of our blog—in fact, we use that exact quote—John Gage—“The computer is the network.” If you can put a network inside the application, you’ve now given your supercloud superpowers because [unintelligible 00:18:13] natively—I mean, this is very 
marketing term, but, “Develop once; deploy anywhere,” and be multi-cloud-native.


Corey: The idea of being able to adapt to emerging usage patterns without full-on redeploy is handy. What I also would like to highlight, too, is that you are, of course, a network overlay and that is something that is fairly well understood and people have seen it, but your preferred adoption model goes up a couple of steps beyond that into altering the way that the application thinks about these things. And you offer an SDK that ranges from single line of code implementation to I think up to 20, so it’s not a massive rewrite of the application, but it does require modification of the stack. What does that buy you, for lack of a better term? Because once you have the application becomes aware of what is effectively its own, “Special network,” quote-unquote, its work to wind up modifying existing applications around something like this. What’s the payoff?


Philip: So, there’s three broad ones that immediately come to my mind. Number one is the highest security that effectively—your private network is inside the app, so you have to somehow break into the app and that can be incredibly complicated, particularly run the app in something like a confidential compute enclave; you can now have a distributed confidential system.


The second is what you’re getting in programmability. You’re able to effectively operate in a fully—even, you know, you get to a GitOps environment. We’re currently working on documentation which says, “Hey, you can do all this stuff in GitOps and then it’ll go into your CI/CD and that’ll talk to the APIs.” And it’ll effectively do everything in a completely programmable manner so that you can treat your private networks as cattle rather than as pets.


The third is transparency. You used the words earlier of bolt-on networking because that’s how we always think about networking security: We bolt it on. As a user, we have to jump through the VPN hoop, we have to go through the bastion, we have to interact with the network. If your private network’s inside the application, then you interact with the application. I can have a mobile application on my device and I have no idea that it’s part of a private network and that the API is private and the malicious actors can’t get to it. I just interact with the application. That is it.


That is what no one else has the ability to do and where OpenZiti has its most power because then you get rid of the constant tug of war between the security team that want to lock everything down and the users and the developers who want to move fast and give a great experience. You can effectively have your cake and eat it.


Corey: The challenge, of course, with rolling a lot of these things out in a way that becomes highly programmable is that unlocks a bunch of capability, but the double-edged sword there is always one of complexity. I mean, we take a look at the way that AWS networking has progressed, and they finally rolled out the VPC Reachability Analyzer, so when two things can’t talk to each other, well, you run this thing and it tells you exactly why, which is super handy. And then just as a way of twisting the knife a little bit, every time you run it, they charge at ten cents for the privilege, which doesn’t actually matter in the context of what anyone is being compensated for, until and unless you build this into something programmatic, but it stings a little bit. And the idea of being able to program these things to abstract away a lot of that complexity is incredibly compelling, except for the part where now it feels like it really increases developer burden on a lot of these things. Have you found that to be true? Do you find that it is sort of like a sliding scale? What has the customer experience been around this?


Philip: I would say a sliding scale. You know, we had one organization who they started with the OpenZiti Tunnelers, and then we convinced them to use the SDK and [unintelligible 00:21:51], “Oh, this was super easy.” And now they just run OpenZiti on themselves. But then they’ve also said at some point, we’ll use the NetFoundry platform, which effectively gives us a SaaS experience in consuming that. One of the huge focus—well, we’ve got a few big focuses for product development, but one of the really big areas is really giving more visibility and monitoring so that rather than people having to react to configuration problems or things which they need to fix in order to ensure your perfect network overlay, instead, those things are being seen and automatically dealt with human-in-the-loop if you want it, in order to remove that burden.


Because ultimately, if you can get the network to a point where as long as you’ve got underlay and you’ve set your policy, the overlay is going to work, it’s going to be secure, and it’s going to give you the uptime you need, that is the Nirvana that we all have to strive for.


Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.


Corey: A common criticism of things that shall we say abstract away the network is a fairly common predictable failure mode. I’ve been making fun of Kubernetes on this particular point for years, and I’m annoyed that at the time that we’re recording this, that is still accurate. But from the cloud providers’ perspective, when you run Kubernetes, it looks like one big really strangely behaved single-tenant application. And Kubernetes itself is generally not aware of zone affinity, so it could just as easily wind up tossing traffic to the node next to it at zero cost or across an availability zone at two cents per gigabyte, or, God forbid across the internet at nine cents a gigabyte and counting depending upon how it works. And the application-side has absolutely no conception of this.


How does OpenZiti address this in the real world because it’s one of those things where it almost doesn’t matter what you folks charge on top of it, but instead oh wow, this winds up being so hellaciously expensive that we can’t use it regardless of whatever benefit it provides just because it becomes a non-starter.


Philip: So, when we built the overlay and the mesh, we did it from the perspective of making it as programmable and self-driven as possible. So, with the whole Terminator strategies that was mentioned earlier, it gives you the ability to start putting logic into how you want packets to flow. Today, it does it on a calculation of end-to-end latency and chooses and reroutes traffic in order to give that information. But there’s no reason that you couldn’t hook it up into understanding what is the numerical in monetary cost for sending a packet along a certain path. Or even what is my application performance monitoring tool saying? Because what that says versus what the network believes could be different things. And effectively you can ingest that information to make your smart routing decisions so all of that logic can exist within the overlay that operates for you.


Corey: I will say that really harkens back, on some level, to what I was experimenting with back when I got my CCNA many years ago where there’s an idea of routing protocols have built into the idea of the cost of a link. I will freely admit slash confess that at the time of the low-cost link, I assumed this was about what was congested or what would wind up having, theoretically, some transit versus peering agreement. It never occurred to me that I’d have to think about those things in a local network and have to calculate in the Byzantine pricing models of cloud providers. But I’ve seen examples of folks who are using OpenZiti, and NetFoundry alike, to wind up building in these costing models so that yeah, ideally, it just keeps everything local, but of that path degrades then yes, we would prefer to go over an expensive link than to basically have TCP terminate on the floor until everything comes back up. It sort of feels like there’s an awful lot of logic you can bake into that goes well beyond what routing protocols are capable of, just by virtue of exposing that programmability.


Well, for this customer because they’re on the pre—on the extreme tier, then we want to have the expensive fallback; for low-tier customers, we might want to have them just have an outage until things end. And it really comes down to letting business decisions express themselves in terms of application behavior while in degraded state. I love that idea.


Philip: Yeah, I understand. We don’t do it today, but there will be a point in the future—I strongly believe—that we’ll be able to say, hey, I’ll give you an SLA on the internet. Because we’ll have such path diversity and visibility of how the internet operates that we’ll be able to say within certain risk parameters of what we can deliver. But then you 
can take it to other logical extremes. You could say, “Hey, I want to build a green overlay. 
I want to make sure that I’m using Arm instances and in data centers of renewable energy so that my network is green.”


Or you can say on a GDPR-compliant overlay so that my data stays within a certain country. You start being able to say—you know, really start dreaming up what are the different policies that I can apply to this because you’re applying a central policy to then what is in the distributed system.


Corey: One last topic I want to cover before we call it an episode is that you are, effectively, a SaaS company that is built on top of an open-source project. And that has been an interesting path for a lot of companies that early on, figured that if they wrote the software, a lot of the contributors who are doing the lion’s share of contribution, that they were clearly the best people to run it. And Amazon’s approach towards operational excellence—as they called it—wound up causing some challenges when they launched the Amazon Basics version of that service. I feel like there are some natural defenses built into OpenZiti to keep it from suffering that fate, but I’m very curious to get your take on it.


Philip: Fundamentally, our take is that—in fact, our mission is to take what was previously impossible and turn it into a standard. And the only way you can really create standards is to have a open-source that is adopted by the wider community and that ecosystems get built around and into. And that means giving an OpenZiti to absolutely everyone so that they can use it, they can innovate on top of it. We all know that very few people actually want to host their own infrastructure, so we assume a large percentage of people will come and go, “Hey, NetFounder, you provide us the hosting, you provide us the SaaS capability so we don’t have to do that ourselves.” But fundamentally in the knowledge that there’s something bigger because it’s not just us maintaining this project; there’s a bunch of people who are doing pull requests and find out cool, fun ways to build further value on what we can build ourselves.


We believe the recent history is littered with examples of the new world built on open-source. And fundamentally, we think that’s really the only way to be able to change an industry so profoundly as we intend to.


Corey: I would also argue that, to be very direct—and I can probably get away with saying this in a way that I suspect you might not be able to—but if AWS had it in their character to simplify things and make it a lot easier for people to work with in a networking sense, what’s stopping them? They didn’t need to wait for an open-source company to wind up coming out of nowhere and demonstrating the value of this. Customers have been asking it for years. I think that at this point, this is something that is unlikely to ever wind up being integrated into a cloud provider’s primary offering. Until and unless the entire industry shifts, at which point we’re having a radically different conversation very far down the road.


Philip: Yeah, potentially because it opens the interesting thing that if you make it so easy for someone to take their data out, do they use your cloud less? There are some cloud providers that will lean into that because they do see more clouds in the future and others that won’t. I see it more myself that as those kind of things happen, it’ll be done on a product-by-product basis. For example, we’re talking to an organization, and [unintelligible 00:29:49] like, “Oh, could you Ziti-fy our JDBC driver so that when users access our database, they don’t have to use a VPN?” [unintelligible 00:29:55], “Yeah. We’ve already done that with JDBC. We called it ZDBC.”


So, we’ll just, instead of using the general industry one—probably the Oracle one or something because that’s kind of standard—we’ll take your one that you’ve created for yourself and be able to solve that problem for you.


Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where’s the best place to find you?


Philip: Best place to go to is netfoundry.io/screaminginthecloud. From there, anyone can grab some free Ziggy swag. Ziggy’s our little open-source mascot, cute little piece of pasta with many different outfits. Little sass as well. And you can find further information both on OpenZiti and NetFoundry.


Corey: And we will put links to both of those in the [show notes 00:30:40]. Thanks so much for taking the time to speak with me today. I really appreciate it.


Philip: It’s a pleasure. Thanks, Corey.


Corey: Philip Griffiths, Head of Business Development at NetFoundry. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me exactly why I’m wrong about AWS’s VPC complexity, and that comment will get moderated and I won’t get to read it until you pay me ten cents to tell you how it got moderated.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


Announcer: This has been a HumblePod production. Stay humble.

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: Today’s episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that’s built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you’re defining those as, which depends probably on where you work. It’s getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that’s exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn’t eat all the data you’ve gotten on the system, it’s exactly what you’ve been looking for. Check it out today at min.io/download, and see for yourself. That’s min.io/download, and be sure to tell them that I sent you.

Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Today’s promoted episode is about a topic that is near and dear to my heart. In the AWS universe, we have seen over time that the networking has gotten more and more capable going from EC2 Classic to the world of VPC network to a whole bunch of other things. But with that capability comes a stupendous amount of complexity, to the point where the easy answer to, “Do you understand how networking works within AWS?” Is, of course, no, “I don’t.”

I’m joined today by Philip Griffiths, who’s the Head of Business Development at NetFoundry. Philip, thank you for joining me.

Philip: Pleasure to be here, Corey.

Corey: So, NetFoundry has what I would argue to be one of the most intriguing-slash-differentiated approaches to handling that ever-increasing complexity around the networking story, not just in AWS, but a number of different cloud providers, and between them, and that approach is to ignore it completely. Have I nailed the salient approach here with that, I guess we’ll call it a flippant statement.

Philip: Yeah, I’d probably say so. It’s the interesting thing where a lot of people say cloud networking is hard, and from our perspective, it should just be super easy, you should be able to provision it in a few minutes with only outbound ports, and set up your policy so that malicious actors can’t get inside it. It should be that easy, and programmable, and it’s a shame that the current world is not.

Corey: One of the hard problems has always been in, I guess, security, which is the thing that everyone pretends to care about right up front, but in practice, often winds up bolting it on after the fact because, “We care about security,” is sort of the trademark phrase of things that we see, usually an email announcing a data breach when it was very clear that companies did not care about security. It’s not just me complaining about how complex the network stack is, but by what directly flows from that. If you aren’t able to fit all of that into your head as far as what’s going on from a security perspective, the odds of misconfiguration creep in and you don’t really become aware of what your risk exposure is. I’m really partial to the idea of just avoiding it entirely. Is NetFoundry, effectively, a network overlay? Is it something that goes a bit beyond that? Effectively, where do you folks start and where do you stop?

Philip: Yes, that is precisely correct. We are a network overlay that’s been built on the principles of zero trust. What is very unique is the ability to be able to start it wherever you want. So yes, you can deploy it from the AWS Marketplace in a few minutes into your VPC or into your operating system, but we also have the ability to actually put it directly into the application stack itself, which has some very interesting complications. What I find as the most interesting starting point is the oxymoron of secure networking.

There are no secure networks. It’s not possible. Networks are designed to share information and taking it to first principles, you can only isolate networks. And this is why we had the thought process for if we’re going to put our overlay network into stuff and make it secure, we have to start at the application level because then we can actually just isolate it to an application communicating into an application, which has profound implications.

Corey: The network part is relatively straightforward. I imagine it just becomes, more or less, what resembles a fairly flat network where everything internal is allowed to talk to each other, and then, in turn, this winds up effectively elevating what should be allowed to talk to what and on what ports and whatnot into something that’s a lot closer to the application logic, and transcends whatever provider it happens to be traversing.

Philip: Yeah, correct. Following the principles of zero trust, we utilize strong embedded identity as a function of what the endpoints are, what the source and destination is. And therefore you build up your policies and services to say what should communicate to what on the basis that the default the least privileged: Absolutely nothing. Your underlay then, the only thing you need is commodity internet with outbound ports. The whole concept of north-south, east-west, if you’re app-embedded, you don’t even need public DNS; you don’t even need DNS at all. Naming conventions go out the window; you don’t need to conform to the standards. You know, you could say, “I want to hit Jenkins.” You go to Jenkins because that can be done.

Corey: I would approach this entire endeavor with a fair bit of suspicion and no small amount of alarm if it were something that you had developed internally, as far as, “Well, we’re just going to replace what amounts to your entire network stack and just go ahead and trust us. It’s fine.” But you didn’t do that. You’re riding on top of the OpenZiti open-source project. And that basically assuages a whole raft of concerns I would have if something like this were proprietary, and people who know what they’re doing—who, let’s be clear, aren’t me—were not able to inspect it and say, “Okay, this passes muster”—as they have done—or alternately, “No, this is terrifyingly dangerous for a variety of excellent reasons.”

And it really feels like a lot of the zero-trust stories that we see these days that are taking advantage of either a network overlay approach or shifting authentication into a different layer, have all taken a somewhat similar tack. I used to think it was a good idea; now I’m starting to suspect it might very well be the only viable model. Do you find that that’s accurate, or was this a subject of some contention when you were starting out?

Philip: So, there’s two very interesting [sigh] thoughts that came to me as you were saying that. The number one is yes, we drove forward with OpenZiti because we’ve seen open-source just completely dominate the industry and everything new that’s been built. If you want to deploy an application, you’re building on Linux. And in fact, you’re probably [laugh] also running on Kubernetes if you’re building new. And our objective was to be able to turn OpenZiti into you know, the open-source, zero-trust private network and equivalent where it’s just standard: You’ll bake your application with Ziti, by design.

It will become a check function that people say you have to comply to. When I look at other vendors and how they look at zero-trust, I broadly see a few things that dishearten me. And again, it’s a big market, a lot of people—everyone says they’re zero-trust nowadays—but I broadly categorize it into a few ways. You have people who are effectively acting as a proxy and they’re adding authentication as a way to check what people should have access to. And they may give access to the whole network, they may do granular; it varies between them. In fact, I’ve just written a blog on this where I effectively call that no-magic zero trust. It’s a blog conceptualized within Harry Potter and [unintelligible 00:07:36] a conversation with my daughter.

Corey: Yeah, any way to tell a story that beats the traditional enterprise voice is very much appreciated over in this corner of the world.

Philip: [laugh]. Yeah, exactly. You have a second tier, which is what I like to think as semi-magical. And that’s where you start saying, I am going to use a software-defined perimeter. So, that it’s first packet authenticate, or outbound-only based upon embedded identity. And in my eyes, this is basically an invisibility cloak.

You then have app-embedded or magical zero-trust. And this is where you’re putting the invisibility cloak inside your application, but you’re also giving it a port key so that when it needs to connect to something else on the other side of the world, it just happens; it’s transparent. And broadly speaking, I think it’s very good that the whole world, including the US government is taking zero-trust incredibly importantly, but the distribution of how people tackle a problem is wildly different. There are some zero-trust solutions, which going in the right direction, but fundamentally, if you’re putting it in front of your— I won’t name a vendor, but there was a vendor who in December, they released a report that said in 90 seconds, common vulnerabilities are exploited something like 96% of the time. 24-hours, 100%.

A few days later, they had a 9.8 CVE on their zero-trust VPN concentrator with a public IP, to which I thought, “If you’re not patching that immediately, you’ve got problems if someone is coming into your network.”

Corey: Absolutely. We just completed our annual security awareness training here, and so much of it just… it really made my skin crawl, there was an entire module on how to effectively detect phishing emails, and I got to tell you, if they ever start running spellcheck on their some of their [spear-phishing 00:09:23] campaigns, then we’re all doomed because that was what the entire training was here. My position is, is that okay, if someone in your company clicks a bad link and it destroys the company’s infrastructure, maybe it’s the person who’s clicking the link that is not necessarily the critical failure point here. Great, if someone compromises an employee workstation, there should be a way to contain the blast radius, they should not now be inside the walls and able to traverse into whatever it is that they want. There should be additional barriers, and zero trust—though it has become, as you say, a catch-all term—seems to be a serious way of looking at this type of compromise and this sort of mitigation against that sort of behavior.

Philip: Definitely. And I think that leads itself to, if you’re using the correct zero-trust solution, you’re able to close [unintelligible 00:10:12] ports, great, you’ve now massively reduced your attack surface. But what if someone does get a phishing injection of ransomware or something to their endpoint or into their servers? The two things that I like to think about is that if you’re creating your overlay network so that the only communication from your server is outbound into the public IPs of your private overlay, then effectively even if the ransomware gets in there, it can’t then connect to its command and control module to then go through the kill cycle to other activities. The other is that if you then look at it [instead 00:10:46] of on the server-side, but actually on the client-side, if someone infects my Mac laptop with ransomware, we use this internal application called Mattermost.

And it’s basically Slack, but open-source. If my Mattermost is Ziti-fied, even I’ve got ransomware on my device, it can’t side-channel attack into Mattermost because you would actually have to break into the Mattermost application and somehow get that Mattermost application to make a compromised query or whatever to get past the system. So really, when I look at zero-trust, it’s not about saying, “We’re secure. Job done. You know, fire the security department because we don’t need them anymore.” It’s all about saying—

Corey: Box check. Hand it off to the auditor.

Philip: [laugh]. Exactly. It’s more about saying the cost of attack, the cost of compromised is increased, ideally, to the point where the malicious actors don’t have a return on investment. Because if they don’t have a return on investment, they will find something else that’s not your applications and your systems to try and compromise.

Corey: I want to make sure that I’m contextualizing this properly because we’re talking—I think—about what almost looks like two different worlds here. There’s the, this is how things wind up working in the ecosystem as far as your server environment goes in a cloud provider, but then we’re also talking about what goes on in your corporate network of people who are using laptops, which is increasingly being done from home these days. Where do you folks start? Where do you stop? Do you transcend into the corporate network as well, or is this primarily viewed as a production utility?

Philip: We do. One of our original design principles with OpenZiti was for it to be a platform rather than a point solution. So, we designed it from the ground up to be able to support any IP packets, TCP, UDP, et cetera, whether you’re doing, client-server, server-server, machine-server, server-initiated, client-initiated, yadda, yadda, yadda. So effectively, the same technology can be applied to many different use cases, depending on where you want to use it. We’ve been doing work recently to handle, let’s call them the hard use cases.

Probably one of the hardest ones out there is VoIP. There is a playbook that is currently taking place where the VoIP-managed service provider gets DDoSed by malicious actors; the playbook is to move it onto a CDN so that you move the attack surface and you get respite for a few hours. And there’s not really any way to solve it because blocking DDoS attacks at layer 3, layer 4 is incredibly difficult unless you can make your PBX dark. And I’ve seen a couple of our OpenZitiE engineers making calls from one device to another without going through the PBX by doing that over OpenZiti, and being able to solve some of the challenges that’s normally associated with VoIP. Again, it was really one of our design principles: How can we make the platform is so flexible that we can do X, Y, Zed today; we’re able to build it, again to become a standard, because it can handle anything.

Corey: One of the big questions that people are going to have going into this is, and this may sound surprising is a little bit less about technical risk of things like encryption and the rest and a lot more around the idea of okay, does this mean that what you are building becomes a central point of business risk? In other words, if the NetFoundry SaaS installation and wherever they happen to be using as their primary winds up going down, does that mean suddenly nothing can talk to one another? Because it turns out that, you know, computers are not particularly useful in 2022 if they aren’t able to talk to other computers, by and large. “The network is the computer,” as was famously stated. What is the failure mode in the event that you experience technical interruption?

Philip: We have this internal sessions, which we call Ziti Kitchens, where our engineering team that are creating Ziti educate on stuff that they’re building. And one of them in the Ziti Kitchen was around HA, HS, et cetera, and all of the functions that we’ve built in so that you have redundancy and availability within the different components. Because effectively it’s an overlay network, so we’ve designed it to be a mesh overlay network. You can setup with one point of failure, but then simultaneously, you can very easily set up to have no points of failure because it can have that redundancy and the overlay has its own mechanisms to do things like smart routing and calculation of underlying costs.

That cost in that instance would be, well, AWS has gone down, so the latency to send a packet or flow over it is incredibly high, therefore I’m going to avoid that route and send the traffic to another location. I always remember this Ziti Kitchen episode because the underlying technology that does it is called Terminators—Ziti has these things called Terminators—some of the slide there was this little heads over the Terminator with the red eyes, you know, the silver exoskeleton, which always made me laugh.

Corey: It’s helpful to have things that fail out of band as opposed to—think of the traditional history in security before everything was branded with zero-trust as a prerequisite for exhibiting at RSA; before that was firewalls was the story, and the question always was, if a firewall fails, do you want it to fail open or fail closed? And believe it or not, there are legitimate answers in both directions; depends on context and what you’re doing. There are some things for example, IAM in a cloud world where you absolutely never want to fail open, full stop. You would rather someone bodily rip the power cable out the back of the data center rather than let that happen. With something like this, where nothing is able to talk to one another if the entire system goes down, yeah, you want to have the control system that you folks run to be out of band, that is almost always the right answer.

As I look at the various case studies that you have on your website and the serious companies that are using what you have built, do you find that they are primarily centralizing around individual cloud providers? Are you seeing that they’re using this as a expression of multi-cloud because I can definitely see a story where oh, it helps bring two cloud providers from a networking and security perspective onto the same page, but I can also see, even within one cloud provider, the idea that, hey, I don’t have to play around with your ridiculous nonsense? What use cases are you seeing emerge among your customers?

Philip: Definitely, the multi-cloud challenge is one that we’re seeing as a emerging trend. We do a lot of work with Oracle and, you know, their stated position is multi-cloud is a fact. In fact for them, if we make the secure networking easier, we can bring workloads into our cloud quicker [unintelligible 00:17:21] the main driver between our partnership. We recently did a blog talking about Superclouds and the advent of organizations like Snowflake and HashiCorp and Confluence and Databricks basically building value and business applications which abstracts away the underlying complexity. But you get into the problem of the standard shared security model, where the customer has to deal with DNS and VPNs and MPLS and AWS Private Endpoint or Azure Private Link or whatever they call it, and you have to assemble this Frankenstein of stuff just to enable a VM to communicate to another VM.

And the posit of our blog—in fact, we use that exact quote—John Gage—“The computer is the network.” If you can put a network inside the application, you’ve now given your supercloud superpowers because [unintelligible 00:18:13] natively—I mean, this is very marketing term, but, “Develop once; deploy anywhere,” and be multi-cloud-native.

Corey: The idea of being able to adapt to emerging usage patterns without full-on redeploy is handy. What I also would like to highlight, too, is that you are, of course, a network overlay and that is something that is fairly well understood and people have seen it, but your preferred adoption model goes up a couple of steps beyond that into altering the way that the application thinks about these things. And you offer an SDK that ranges from single line of code implementation to I think up to 20, so it’s not a massive rewrite of the application, but it does require modification of the stack. What does that buy you, for lack of a better term? Because once you have the application becomes aware of what is effectively its own, “Special network,” quote-unquote, its work to wind up modifying existing applications around something like this. What’s the payoff?

Philip: So, there’s three broad ones that immediately come to my mind. Number one is the highest security that effectively—your private network is inside the app, so you have to somehow break into the app and that can be incredibly complicated, particularly run the app in something like a confidential compute enclave; you can now have a distributed confidential system.

The second is what you’re getting in programmability. You’re able to effectively operate in a fully—even, you know, you get to a GitOps environment. We’re currently working on documentation which says, “Hey, you can do all this stuff in GitOps and then it’ll go into your CI/CD and that’ll talk to the APIs.” And it’ll effectively do everything in a completely programmable manner so that you can treat your private networks as cattle rather than as pets.

The third is transparency. You used the words earlier of bolt-on networking because that’s how we always think about networking security: We bolt it on. As a user, we have to jump through the VPN hoop, we have to go through the bastion, we have to interact with the network. If your private network’s inside the application, then you interact with the application. I can have a mobile application on my device and I have no idea that it’s part of a private network and that the API is private and the malicious actors can’t get to it. I just interact with the application. That is it.

That is what no one else has the ability to do and where OpenZiti has its most power because then you get rid of the constant tug of war between the security team that want to lock everything down and the users and the developers who want to move fast and give a great experience. You can effectively have your cake and eat it.

Corey: The challenge, of course, with rolling a lot of these things out in a way that becomes highly programmable is that unlocks a bunch of capability, but the double-edged sword there is always one of complexity. I mean, we take a look at the way that AWS networking has progressed, and they finally rolled out the VPC Reachability Analyzer, so when two things can’t talk to each other, well, you run this thing and it tells you exactly why, which is super handy. And then just as a way of twisting the knife a little bit, every time you run it, they charge at ten cents for the privilege, which doesn’t actually matter in the context of what anyone is being compensated for, until and unless you build this into something programmatic, but it stings a little bit. And the idea of being able to program these things to abstract away a lot of that complexity is incredibly compelling, except for the part where now it feels like it really increases developer burden on a lot of these things. Have you found that to be true? Do you find that it is sort of like a sliding scale? What has the customer experience been around this?

Philip: I would say a sliding scale. You know, we had one organization who they started with the OpenZiti Tunnelers, and then we convinced them to use the SDK and [unintelligible 00:21:51], “Oh, this was super easy.” And now they just run OpenZiti on themselves. But then they’ve also said at some point, we’ll use the NetFoundry platform, which effectively gives us a SaaS experience in consuming that. One of the huge focus—well, we’ve got a few big focuses for product development, but one of the really big areas is really giving more visibility and monitoring so that rather than people having to react to configuration problems or things which they need to fix in order to ensure your perfect network overlay, instead, those things are being seen and automatically dealt with human-in-the-loop if you want it, in order to remove that burden.

Because ultimately, if you can get the network to a point where as long as you’ve got underlay and you’ve set your policy, the overlay is going to work, it’s going to be secure, and it’s going to give you the uptime you need, that is the Nirvana that we all have to strive for.

Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they’re all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don’t dispute that but what I find interesting is that it’s predictable. They tell you in advance on a monthly basis what it’s going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you’re one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you’ll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.

Corey: A common criticism of things that shall we say abstract away the network is a fairly common predictable failure mode. I’ve been making fun of Kubernetes on this particular point for years, and I’m annoyed that at the time that we’re recording this, that is still accurate. But from the cloud providers’ perspective, when you run Kubernetes, it looks like one big really strangely behaved single-tenant application. And Kubernetes itself is generally not aware of zone affinity, so it could just as easily wind up tossing traffic to the node next to it at zero cost or across an availability zone at two cents per gigabyte, or, God forbid across the internet at nine cents a gigabyte and counting depending upon how it works. And the application-side has absolutely no conception of this.

How does OpenZiti address this in the real world because it’s one of those things where it almost doesn’t matter what you folks charge on top of it, but instead oh wow, this winds up being so hellaciously expensive that we can’t use it regardless of whatever benefit it provides just because it becomes a non-starter.

Philip: So, when we built the overlay and the mesh, we did it from the perspective of making it as programmable and self-driven as possible. So, with the whole Terminator strategies that was mentioned earlier, it gives you the ability to start putting logic into how you want packets to flow. Today, it does it on a calculation of end-to-end latency and chooses and reroutes traffic in order to give that information. But there’s no reason that you couldn’t hook it up into understanding what is the numerical in monetary cost for sending a packet along a certain path. Or even what is my application performance monitoring tool saying? Because what that says versus what the network believes could be different things. And effectively you can ingest that information to make your smart routing decisions so all of that logic can exist within the overlay that operates for you.

Corey: I will say that really harkens back, on some level, to what I was experimenting with back when I got my CCNA many years ago where there’s an idea of routing protocols have built into the idea of the cost of a link. I will freely admit slash confess that at the time of the low-cost link, I assumed this was about what was congested or what would wind up having, theoretically, some transit versus peering agreement. It never occurred to me that I’d have to think about those things in a local network and have to calculate in the Byzantine pricing models of cloud providers. But I’ve seen examples of folks who are using OpenZiti, and NetFoundry alike, to wind up building in these costing models so that yeah, ideally, it just keeps everything local, but of that path degrades then yes, we would prefer to go over an expensive link than to basically have TCP terminate on the floor until everything comes back up. It sort of feels like there’s an awful lot of logic you can bake into that goes well beyond what routing protocols are capable of, just by virtue of exposing that programmability.

Well, for this customer because they’re on the pre—on the extreme tier, then we want to have the expensive fallback; for low-tier customers, we might want to have them just have an outage until things end. And it really comes down to letting business decisions express themselves in terms of application behavior while in degraded state. I love that idea.

Philip: Yeah, I understand. We don’t do it today, but there will be a point in the future—I strongly believe—that we’ll be able to say, hey, I’ll give you an SLA on the internet. Because we’ll have such path diversity and visibility of how the internet operates that we’ll be able to say within certain risk parameters of what we can deliver. But then you can take it to other logical extremes. You could say, “Hey, I want to build a green overlay. I want to make sure that I’m using Arm instances and in data centers of renewable energy so that my network is green.”

Or you can say on a GDPR-compliant overlay so that my data stays within a certain country. You start being able to say—you know, really start dreaming up what are the different policies that I can apply to this because you’re applying a central policy to then what is in the distributed system.

Corey: One last topic I want to cover before we call it an episode is that you are, effectively, a SaaS company that is built on top of an open-source project. And that has been an interesting path for a lot of companies that early on, figured that if they wrote the software, a lot of the contributors who are doing the lion’s share of contribution, that they were clearly the best people to run it. And Amazon’s approach towards operational excellence—as they called it—wound up causing some challenges when they launched the Amazon Basics version of that service. I feel like there are some natural defenses built into OpenZiti to keep it from suffering that fate, but I’m very curious to get your take on it.

Philip: Fundamentally, our take is that—in fact, our mission is to take what was previously impossible and turn it into a standard. And the only way you can really create standards is to have a open-source that is adopted by the wider community and that ecosystems get built around and into. And that means giving an OpenZiti to absolutely everyone so that they can use it, they can innovate on top of it. We all know that very few people actually want to host their own infrastructure, so we assume a large percentage of people will come and go, “Hey, NetFounder, you provide us the hosting, you provide us the SaaS capability so we don’t have to do that ourselves.” But fundamentally in the knowledge that there’s something bigger because it’s not just us maintaining this project; there’s a bunch of people who are doing pull requests and find out cool, fun ways to build further value on what we can build ourselves.

We believe the recent history is littered with examples of the new world built on open-source. And fundamentally, we think that’s really the only way to be able to change an industry so profoundly as we intend to.

Corey: I would also argue that, to be very direct—and I can probably get away with saying this in a way that I suspect you might not be able to—but if AWS had it in their character to simplify things and make it a lot easier for people to work with in a networking sense, what’s stopping them? They didn’t need to wait for an open-source company to wind up coming out of nowhere and demonstrating the value of this. Customers have been asking it for years. I think that at this point, this is something that is unlikely to ever wind up being integrated into a cloud provider’s primary offering. Until and unless the entire industry shifts, at which point we’re having a radically different conversation very far down the road.

Philip: Yeah, potentially because it opens the interesting thing that if you make it so easy for someone to take their data out, do they use your cloud less? There are some cloud providers that will lean into that because they do see more clouds in the future and others that won’t. I see it more myself that as those kind of things happen, it’ll be done on a product-by-product basis. For example, we’re talking to an organization, and [unintelligible 00:29:49] like, “Oh, could you Ziti-fy our JDBC driver so that when users access our database, they don’t have to use a VPN?” [unintelligible 00:29:55], “Yeah. We’ve already done that with JDBC. We called it ZDBC.”

So, we’ll just, instead of using the general industry one—probably the Oracle one or something because that’s kind of standard—we’ll take your one that you’ve created for yourself and be able to solve that problem for you.

Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where’s the best place to find you?

Philip: Best place to go to is netfoundry.io/screaminginthecloud. From there, anyone can grab some free Ziggy swag. Ziggy’s our little open-source mascot, cute little piece of pasta with many different outfits. Little sass as well. And you can find further information both on OpenZiti and NetFoundry.

Corey: And we will put links to both of those in the [show notes 00:30:40]. Thanks so much for taking the time to speak with me today. I really appreciate it.

Philip: It’s a pleasure. Thanks, Corey.

Corey: Philip Griffiths, Head of Business Development at NetFoundry. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me exactly why I’m wrong about AWS’s VPC complexity, and that comment will get moderated and I won’t get to read it until you pay me ten cents to tell you how it got moderated.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.