From A to Z in Alphabet’s Soup with Seth Vargo

Episode Summary

Seth Vargo, and Engineer at Google, was the third guest ever on “Screaming!” Now, at three hundred plus episodes later, he is back to catch up with Corey. This time around Corey isn’t nearly as tentative with the microphone, so the conversation is bound to start on good footing. Seth is still at Google, but primarily works with Alphabet helping companies within the conglomerate umbrella securely and privately consume public cloud. Seth’s work has transitioned from Cloud PA or “product area” to what he calls “Core PA.” In Core PA is his work across various clouds, dare we dive into the semantics of multi-cloud, for the stable of companies under Alphabet. Seth offers up some reflections on the complexity of working in a massive entity, what exactly privately means in the context of his work, GCP, and more!

Episode Show Notes & Transcript

About Seth
Seth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.

Links:

Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: The company 0x4447 builds products to increase standardization and security in AWS organizations. They do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain and fail-tolerant solutions, one of which is their VPN product built on top of the popular OpenVPN project which has no license restrictions; you are only limited by the network card in the instance.


Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I have a return guest today, though it barely feels like it qualifies because Seth Vargo was guest number three on this podcast. I’ve had a couple of folks on since then, and for better or worse, I’m no longer quite as scared of the microphone as I was back in those early days. Seth, thank you for joining me.


Seth: Yeah, thank you so much for having me back, Corey. Really excited to figure out whatever we’re talking about today.


Corey: Well, let’s start there because last time we spoke, you were if memory serves a developer advocate at Google Cloud.


Seth: Correct.


Corey: And you’ve changed jobs, but not companies—but kind of companies because, welcome to large environments—but over the past few years, you have remained at Google. You are no longer at Google Cloud and you’re no longer a developer advocate. In fact, your title is simply ‘Engineer at Google.’ And what you’ve been focusing on, to my understanding, is helping Alphabet companies, namely—you know, the Alphabet, always in parentheses in journalistic styles, Google’s parent company because no one thinks of it in terms of Alphabet—is—you’re effectively helping companies within the conglomerate umbrella securely and privately consume public cloud.


Seth: Yes, that is correct. So, I used to work in what we call the Cloud PA—PA stands for product area. Other product areas are like Chrome and Android—and I moved to the Core PA where I’m helping lead and run an initiative that, like you said, is to help Alphabet companies to, you know, securely and privately use public cloud services.


Corey: So, I am going to go out on a limb because my position on multi-cloud has always been pick a cloud—I don’t particularly care which one—but pick one and focus on that. I’m going to go out on a limb and presume that given that you are not at Google Cloud anymore, but you are at Google, you probably have a slight preference as far as which public cloud these various companies within the umbrella should be consuming.


Seth: Yeah. I mean, obviously, I think most viewers will think the answer is GCP. And if you said GCP, you would be, like, 95% correct.


Corey: Well, you’d also be slightly less than that correct, because they’re doing a whole rebrand and calling it Google Cloud in public, as opposed to GCP. You really don’t work for the same org anymore. You’re not up-to-date on the very latest messaging talking points.


Seth: I missed—ugh, there’s so many TLAs that you lose all your TLAs over time.


Corey: Oh, yes.


Seth: So, Google Cloud would be, like, 95% correct. But what you have to really understand is, Google has its own, you know, cloud—we didn’t call it a cloud at the time, you might call it on-prem or legacy infrastructure, if you will—primarily built on a scheduling system called Borg, which is like Kubernetes version zero. And a lot of the Alphabet companies have workloads that run onboard. So, we’re actually talking about hybrid cloud here, which, you know, you may not think of Google is like a hybrid cloud customer, but a workload that runs on our production infrastructure called Borg that needs to interact with a workload that runs on Google Cloud, that is hybrid cloud, it’s no different than a customer who has their own data center that needs peering to a public cloud provider, you know, whether that’s Google Cloud, or AWS, or Azure.


I think the other thing is if you look at, like, the regulatory space, particularly a lot of the Alphabet companies operate in, say, like healthcare, or finance, or FinTech, where certain countries and certain jurisdictions have regulations around, like, you must be multi-cloud. You know, some people might say that means you have to run, you know, the same instance of the same app across clouds, or some people say your data can be here, but your workloads can be over there. That’s to be interpreted, but you know, I would say 95% of GCP, but there is a—or sorry, 95% is Google Cloud—


Corey: There we go.


Seth: But there is a small percentage that is definitely going to be other cloud providers and hybrid cloud as well.


Corey: My position on multi-cloud has often—people like to throw it in my face of, “See you gave this general guidance, and therefore whenever you say something that goes against it, you’re a giant phony.” And it’s yeah, Twitter doesn’t do so well with the nuance. My position of pick a provider and go all-in is intended as general guidance for the common case. There are exceptions to this and any individual company or customer is going to have more context than that general guidance will. So, if you say you need to be in multiple clouds for certain reasons, you’re probably correct.


If you say you need to be in multiple clouds because your regulator demands it, you are certainly correct. I am not arguing against that in any way. I do want to disclaim my one of my biases here as well, and that is specifically that if I were building a startup today and I were not me—by which I mean having spent ten years in the AWS ecosystem learning, not just how it works, but how it breaks because that’s important in production, and you know, also having a bunch of service owners at AWS on speed dial—and I, were approaching this from the naive, I need to pick a cloud, which one would I go with, my bias is for Google Cloud. And the reason behind that is the developer experience is spectacular as the primary but not only perspective on that. So, I am curious to know that as you’re helping what are effectively internal customers move to Google Cloud, is their interaction with Google Cloud as a platform the same as it would be if I as a random outside customer, were using Google Cloud? Is there a bunch of internal backchannels? “Oh, you get the good kind of internal Google Cloud that most of us don’t get access to?” Or something else?


Seth: Yeah, so that’s a great question. So first, you know, thank you for the kind words on the developer experience—


Corey: They were honest words, to be clear. Let me be very direct with you, if I thought your developer experience was trash, I might not say it outright in their effort not to be, you know, actively antagonistic to someone I’m having on the show right now, but I would not say it if I didn’t believe it.


Seth: Yeah. And I totally—I know you, I’ve known you for many years. I totally believe you. But I do thank you for saying that because that was the team that I was on before this was largely responsible for that across the platform. But back to your original question around, like, what does the support experience look like? So, it’s a little bit of both.


So, Alphabet companies, they get a technical account manager, very similar to how, you know, reasonable-sized spend customer would get a technical account manager. That account manager has access to the Cloud support channels. So, all that looks the same. I think we’re things look a little bit different is because myself and some of our other leads came from Cloud, you know, I generally don’t like this phrase, but we know people. So, we tend not to go directly to Cloud when we can, right?


We want Alphabet companies to really behave and act as if they were an external entity, but we’re able to help the technical account manager navigate the support process a little bit better by saying like, “You need to ask for this person,” right? You need to say these words to get in front of the right person to get this ticket assigned to the right person. So, the process is still the same, but we’re able to leverage our pre-existing knowledge with Cloud. The same way, if you had a [unintelligible 00:07:45] or an ex-Googler who worked for your company, would be able to kind of help move that support process along a little bit faster.


Corey: I am quite sincere when I say that this is a problem that goes far beyond simply Google. A disturbing portion of my job as a cloud economist helping my clients consists of nothing other than introducing Amazonians to one another. And these are hard problems at scale. I work at a company with a dozen people in it. And it turns out that yeah, it’s pretty easy to navigate who’s responsible for what. When you have a hyperscale-size company in the trillion-dollar range, a lot of that breaks down super quickly.


Seth: And there’s just a lot of churn at all levels of the organization. And, you know, we talked about this when I first joined the show, like, I switched roles, I used to be in Cloud, and now I’m in what we call Core. I still get people who are reaching out to me, at Google and externally, who are saying, “Oh, can you answer this question? Hey, how do I do this?” And I, you know, I’ve gradually over the past couple of months, you know, convinced people that I don’t work on that anymore, and I try to be helpful where I can, but the—


Corey: You use the old name and everything. They’re eventually going to learn, right?


Seth: I know. They’ll be like, “What do you call this? GCP? Okay, great. We don’t need you anymore.” But it’s true, right? Like, there’s people leave the organization, people join the organization, there’s reorgs, there’s strategic changes, people, you know, switch roles 
within the org, and all of that leads to complexity with, you know, navigating, what is the size of a small nation, in some cases.


Corey: Your line in your biography says that you enable Alphabet companies to securely and privately consume public cloud. Now, that would make perfect sense and I would really have no further questions based on what we’ve already said, except for the words securely and privately, and I want to dive into that, first. Let’s work backwards with the second one first. What is ‘privately’ mean in this context?


Seth: So, privately means, like, privacy-preserving for both the Alphabet company and the users or customers that they have. So, when we look at that from the perspective of the Alphabet company, that means protecting their data from the eyes of the cloud provider. So, that’s things like customer-managed encryption keys, you know, bring-your-own-encryption, that’s making sure that you have things like, actually, transparency so that if at any point the cloud provider is accessing your data, even for a legitimate purpose, like submitting a support ticket or something—or diagnosing a support ticket, that you have visibility into that. Then the privacy-preserving side on the Alphabet company’s customers is about providing that same level of visibility to their customers as well as making sure that any data that they’re storing is, you know, private, it’s not accessible to certain parties, it’s following whether it’s like, you know, actual legislation around how long data can be persisted, things like GDPR, or if it’s just a general, like, data retention, insider risk management, all of that comes into this idea of, like, building a private system or privacy-preserving system.


Corey: Let’s be very clear that my position on it is that Google’s relationship with privacy has been somewhat challenged, in due to no small part to the sheer scale of how large Google has grown. And let’s be clear, I believe firmly that at certain points of scale, yeah, you deserve elevated levels of scrutiny. That is how we want society to function, by and large. And there are times where it feels a little odd on the cloud side. For example, as the time is recording, somewhat recently, there was a bug in some of the copyright detection stuff where Google Drive would start flagging files as having copyright challenges if they contained just the character ‘1’ in them.


Which, okay, clearly a bug, but it was a bit of a reminder for some folks that wait, but that’s right, Google does tend to scan these things. Well, when you have a bunch of end-user customers and in the ways that Google does, that stuff is baked in and it shapes how you wind up seeing things. From Amazon’s perspective, historically, they basically sold books and then later underpants. And doing e-commerce transactions was basically the extent of their data work with customers. They weren’t really running large-scale, file sharing systems and abilities—in collaboration suites, at least not that really had any of those pesky things called customers.


So, that is not built into their approach and their needs in the same way. To be clear, I am sympathetic to the problems, but it’s also… it’s a challenging problem, especially as you continue to evolve and move things into cloud, you absolutely must be able to trust your cloud 
provider, or you should not be working on that cloud provider, has been my approach.


Seth: Yeah, I mean, there’s certainly things that you can do to mitigate. But in general, like, there is some level of trust, forget the data, on the availability side, right? Like when the cloud provider says, “This is our SLA.” And you agree to that SLA, like, yeah, you get money back if they mess it up, but ultimately, you’re trusting them to adhere to that SLA, right? And you get recompense if they fail to do so, but that’s still, like, trust—trust is far more than just on the privacy side, right? It’s on… the promise on the roadmap, it’s on privacy, it’s on the SLA, right?


Corey: Yeah. And you see that concern expressed more articulately from enterprise customers, when there’s a matter of trusting companies to do what they say, such as the continued investment that Alphabet slash Google is making in Google Cloud. It’s easy to take the approach of well, you’ve turned off a bunch of consumer services, so therefore, you’re going to turn off the cloud at some point, too. No, let me be very clear, for the record, I do not believe that you are going to one day flip a switch and turn off Google Cloud. And neither do your customers.


Instead, the approach, the way that enterprises express this, it’s not about you flipping the switch and turning it off—that’s what contracts are for—their question, and they enshrine this in contracts, in some cases, in the event, not that you turn it off, but that you fail to appropriately continue to invest in the platform. Because at enterprise scale, this is how things tend to die. It is not through flipping a switch, in most cases, it’s through, “We’re just going to basically mothball it, keep it more or less exactly as it is until it slowly fades into irrelevance for a long period of time.” And when you’re providing the infrastructure to run things for serious institutions, that part isn’t okay. And credit where due, I have seen every indication that Google means it when they say this is an area of strategic and continued ongoing focus for us as a company.


Seth: Yeah, I mean, Google is heavily investing in cloud. I mean, this is a brand new group that I’m working in and we’re trying to get Alphabet companies onto cloud, so obviously there’s some very high-level top-down executive support for this. I will say that the—a hundred percent agree with everything you’re saying—the traditional enterprise approach of build this Java app—because let’s be honest, it’s always Java—build this Java app, compile it into a JAR and run it forever is becoming problematic. We saw this recently with, like, the log4j—


Corey: Yeah, to be in a container. What the hell?


Seth: [laugh].


Corey: I’m kidding. I’m kidding. Please don’t send me email, whatever you do.


Seth: What’s a container? I’m just kidding. Like, the idea of, like, software rotting is very real and it’s becoming more and more of a risk to security, to privacy, to public cloud providers, to enterprises, where when you see something like log4j happen and you can’t answer the question, like, do we have any code that uses that? Like, if getting the answer to that question takes you six weeks, [sigh] boy like, a lot of stuff can happen in six weeks while that particular thing is exploited. And you know, kind of gets into software supply chain a little bit, but I do agree that, like, secure, private, and stable APIs are super important, and it’s an area where Google is investing. At the same time, I think the industry is moving, the enterprise industry is moving away a little bit from set-it-and-forget-it as a strategy.


Corey: I want to talk about the security portion as well as far as securely consuming public cloud goes. And let me start off with a disclaimer here because I don’t want people to misconstrue what I’m about to say. If you are migrating to one of the big three cloud providers, their security will be better than anything you will be able to achieve as a company yourself. Not you personally because Google is a bit of an asterisk to that statement, given what you have been doing and have been doing since the ’90s in your on-prem world with Borg and the rest, but my philosophy on the relative positioning of the security of cloud providers relative to one another has changed. I spent four months beating the crap out of Azure forever having an issue where there was control plane access and then really saying nothing about it.


And after I wound up finding—the day after I put out a blog post on that topic because I was tired of the lack of response, it came out that right at the same time AWS had a very similar problem and had not said anything themselves. And they went back and forth, apparently waiting to wind up doing a release until this happened, Orca Security wound up putting one out there, and it was frustrating on a couple of levels. First, the people at both of these companies who work in security are stars. There is no argument, no bones about that. Problems are going to happen, things are going to occur as a result, and the only saving grace then is the transparency and communication around it, and there was none of it from them.


I’m also more than a little bit irked that my friends at AWS were aware of this, basically watched me drag Azure for four months knowing that they’d done the same thing and never bothered to say a word. But okay, that’s a choice. I’ve been saying for a while that of the big three, Google’s security posture is the most impressive. And it used to be a slight difference. Like, you nosed ahead of AWS in that respect, not by a huge margin, but by a bit.


I don’t think it’s nearly as close these days, in my mind, and talking to other large companies about these things, and people who are paid to worry about these things all day long, I am very far from alone in that perspective. So, I guess my question for you is, as you look at moving the workload securely to Google Cloud, it feels like security is baked into everything that all aspects of your company have done. Why is that a specific area of focus? Or is that how it gets baked into everything you folks do?


Seth: So, you kind of like set up the answer for this perfectly. I swear we didn’t talk about this extensively beforehand.


Corey: You didn’t know any of that was coming, by the way, just to be very clear here. I don’t sit here and feed, “All right, I’m going to say this. And here’s the right res—” No, this is an impromptu, more or less ad hoc show every time I do it.


Seth: Yeah. And I’m going to preface this by saying, like, I don’t want this to sound, like, egotistical, but I have never found a company that has as rigorous security and privacy policies, reviews, and procedures as Google.


Corey: I thought I had and I was wrong.


Seth: Yeah. And—


Corey: And I have a lot of apologizing to people to do as a result of that.


Seth: And honestly, every time I interact with our internal security engineering teams, or our IP protection teams, I’m that Nathan Fillion meme, where he’s like, what—you know, like, “Okay, I get it. I get it.” Right?


Corey: And then facepalm it, uh, I should say some—I can’t—yeah. Oh, yeah.


Seth: The reason that it’s hard for Alphabet companies to securely and privately move to cloud specifically for security, is because Alphabet’s stance is so much more rigorous than anyone else in the industry, to the point where, in some cases, even our own cloud provider doesn’t meet the bar for what we require for an internal workload. And that’s really what it comes down to is, like, the reason that Google is the most secure cloud is because our bar is so high that sometimes we can’t even meet it.


Corey: I have to assume that the correct answer on this is that you then wind up talking to those product teams and figure out how to get them to a point where they can support that bar because the alternative is effectively, it’s like, “Oh, yeah, this is Google Cloud and it’s absolutely right for multinational banks to use, but you know, not Google workloads. That stuff’s important.” And I don’t think that is necessarily how you folks tend to view these things.


Seth: So, it’s a bidirectional stream, right? So, a lot of it is working with a product management team to figure out where we can add these additional security properties into the system—I should say, tri-directional. The second area is where the policy is so specific to Google that Google should actually build its own layer on top of it that adds the security because it’s not generally applicable to even big, huge cloud customers. And then the third area is Google’s a very big company. Sometimes we didn’t write stuff down, and sometimes we have policies where no one can really articulate where that policy came from.


And something that’s new with this approach that we’re taking now is, like, we’re actually trying to figure out where that policy came from, and get at the impetus of what it was trying to protect against and make sure that it’s still applicable. And I don’t know if you’ve ever worked with governments or you know, large companies, right, they have this spreadsheet of hundreds of thousands of lines—


Corey: You are basically describing my client list. Please continue.


Seth: I mean, like, sometimes they have to use an Access database because they exhaust the number of rows in an Excel spreadsheet. And it’s just checklist upon checklist upon checklist. And that’s not how Google does security, right? Security is a very all-encompassing, kind of, 360 type of thing. But we do have policies that are difficult to articulate what they’re actually protecting against, and we are constantly re-evaluating those, and saying, like, “This made sense on Borg. Does it actually make sense on Cloud?” And in some cases, it may not. We get the same protections using, say, a GCP-native service, and we can omit that requirement for this particular workload.


Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the
word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.


Corey: I think that when it comes to things like policies that are intelligently crafted around security, you folks—and to be fair, the AWS security engineers as well—have been doing it right in that, okay, we’re going to build a security control to make sure that a thing can’t happen. That’s not enough. Then there’s the defense-in-depth. Okay, let’s say that control fails for some variety of ways. Here are the other things we’re going to do to prevent cross-account access, for example.


And that in turn, winds up continuing to feed on itself and build into a culture of assuming that you can always continue to invest in security. How far is enough? Well, for most folks, they haven’t gone far enough yet.


Seth: Another way to put this is like, how well do you want to sleep at night? You know, there’s folks on the Google security engineering team who are so smart, and they work on, like, our offensive security team, so their full-time job is to try to hack Google and then figure out how to prevent that. And, you know, so I’ve read some of the reports and some of the ways they think and I’m like, “How do you… how do you pick up a mobile phone and go to like, any website confidently knowing what you know?” Right? [laugh] and like, how do you—


Corey: Who said anything about confidently? Yeah.


Seth: Yeah. Yeah. How do you use self-checkout at a supermarket and, like, not just, like, wear your entire full-body tinfoil hat suit? But you know, I think the bigger risk is not knowing what the risks are. And this is a lot what we’re seeing in software supply chain, too, is a lot of security is around threat modeling and not checklists. But we tend to, like, gravitate toward checklists because they’re concrete.


But you really have to ask yourself, like, do I need the same security properties on my static blog website that is stored on an S3 bucket or a GCS bucket that’s public to the internet, that I do on my credit card processing service? And a lot of times we don’t treat those differently, we don’t apply a different threat model to them, and then everything has to have the same level of security.


Corey: And then everything is in-scope for whatever it is you’re trying to defend against. And that is a short path to madness.


Seth: Yes. Yes. Your static HTML files and your GCS bucket are in scope for SOC 1 and 2 because you didn’t have a way to say they weren’t.


Corey: Yeah. You’ve also done some—again, the nice thing about being at a company for a while—from what I can tell, given that I’ve never done until I started this place—is you move around and work on different projects. You were involved as well, personally, in the exposure notifications project, the joint collaboration thing between a number of companies in the somewhat early days of the pandemic that all of our phones talk to one another and anonymously and in a privacy-preserving way, let us know that hey, by the way, someone you were in close contact with has tested positive for Covid 19 in the previous fixed period of time. What did do you do over there?


Seth: Yeah, so the exposure notifications project was a joint effort, primarily between Apple and Google to use Android and iOS devices to help stop the spread of Covid or reduce the spread of Covid as much as possible. The idea being because the incubation period is roughly 14 days, at least pre-Omicron, if we could tell you hey, you might have been exposed and get you to stay at home for three or four days, self-isolate, we could dramatically reduce the spread of Covid. And we know from some of the studies that have come out of, like, the UK and European region that, like, the technology actually reduced the spread of cases by, like, fourteen-hundred percent in some cases. I was one of the tech leads for the server-side. So, the way the system works is it uses the low-energy Bluetooth on iOS and Android devices to basically broadcast random IDs.


So, I know this is Screaming into the Cloud, but if we can just quickly Screaming into the Void as a rebrand—


Corey: Oh, yeah.


Seth: —that’s basically what’s happening. [laugh]. You’re generating these random identifiers, and just, like, yelling them, and there’s other phones out there who are listening. And they collect these we’ll call RPIs—or Rolling Indicators. They have no data in them.


They’re like literally, like, a UUID or 32 bytes of random data, they aren’t at all, like, associated with your device or your person. So, then what happens is, like, let’s say you’re in a supermarket, you’re near someone for, you know, every so often, and your phones exchange these IDs. If you then test positive, those IDs go up to a centralized server, the server again, also has no idea who you are, so the whole thing is privacy-preserving, end-to-end, then the server basically bundles all of what we call the TEKs, or the Temporary Exposure Keys—into a tarball that go up onto a CDN, and then every night, all of the devices that are participating in EN download this into a local key match. So, at no point does the server ever know that you were in a supermarket with someone else, only your phone knows that you came in contact with this TEK in the past 14 days—or 21 days in some jurisdictions—and it’ll generate an exposure notification or an exposure alert, which says, like, “Hey, in the past 14 days, you’ve come in contact with someone who’s confirmed positive for Covid.” And then there’s guidance kind of varies by state and by health jurisdiction of, like, self-isolate, or go get tested, or whatever. But the idea—


Corey: Or go to the bar in some places, apparently.


Seth: Oh. Yeah. The server itself is actually—there’s a verification component because ideally, like, we don’t want people to just be like, oh, I’m Covid positive, and then like, all their friends get an alert, right? There needs to be some kind of verification mechanism where you either have a positive test, or you have a clinician or a physician who issues you code that you can put into your app so you can then release your keys. And then there’s the actual key server component, which I kind of already described.


So, it’s a pretty complex system and actually is entirely serverless. So, the whole thing, including all, like, background job processing, it was designed to be serverless from the beginning. Total greenfield project, right, like, nothing like this exists, so we’re really fortunate there. We made some fun and interesting design decisions to keep costs down while, you know, abusing slash using some of the features of serverless like auto-scaling and, you know, being able to fan out across multiple regions and things like that—


Corey: And using DNS as a database. My personal favorite approach to things?


Seth: We don’t use DNS as a database. We do use Postgres—


Corey: A missed opportunity.


Seth: —a real database. But we do use DNS, just not for storing information.


Corey: So, one question I have for you is that you’ve been at Google for a while and you’ve done an awful lot of things there, but previously, you’ve also done things that don’t really directly aligne any of this stuff going on there. You were at HashiCorp and you were at Chef, neither of whom, to my understanding are technologies that Google makes extensive use of internally for their own stuff. It seems like—and even when you’re at Google, you have been continually reinventing what it is that you do. I find that admirable because very often, when you see people at a company for a protracted period of time, they sort of get more or less pigeonholed into the role that looks fairly similar from year-to-year. You’ve been incredibly dynamic. Was it intentional and how do you do it?


Seth: So, I have a diagnosed medical condition called Career-DHD. I’m just kidding, but I do. I get bored, and it’s actually something that I’m really forward with my managers about. I’ve always been very straight with my managers and the people I work with it, like, 8 to 12 months from now, I will be doing something different. It will be different.


Corey: I wish I’d figured that out earlier on. In my case, the way that I wound up solving for that is I’ve got to come in, I’m going to solve a interesting problem. When I’m done with that, the consulting engagement is over and then I’m going to go away and everyone knows the score going in. Works out way better than, and then I’m going to go cause problems on purpose in other people’s parts of the org because I see problems there. That was where I always went off the rails.


Seth: [laugh]. Yeah, I mean, I don’t take a dissimilar approach. You know, I try to find high-priority, strategic things that also align with my interest. And it’s important to me that there’s things that I can provide and things that I can learn. I never like to be the smartest person in the room because you shouldn’t be in that room anymore; there’s no one for you to learn from. And it’s great to share knowledge, but—


Corey: I’m not convinced I’m the smartest person in the room right now, despite the fact that right now I’m the only person in the room that I’m sitting in.


Seth: I mean, that Minecraft store is pretty intelligent.


Corey: I saw Chihuahua wandering around here, too, a—


Seth: [laugh].


Corey: —minute ago, so there is that.


Seth: But, you know, I think from, like, a career advice standpoint, I tell everyone, you should interview somewhere else at least once a year. You never know what’s out there, and worst-case scenario, you kept your interview skills up to date.


Corey: Keeping those skills in tune is so critically important just because it’s a unique skill set that, for many folks, does not have a whole lot of applicability in their day-to-day job. So, if you suddenly have to find a new job, great, you’re rusty at this, it’s been years, and you’re trying to remember, like, okay, when someone asks you what you’re looking for in your next job, they’re not trying to pick a fight. Don’t respond as if they were. Like, the basic stuff. It’s a skill, like anything else.


Seth: Yeah. And, like, the common questions like, you know, “What do you want to do with your life?” Or like, “What accomplishment are you most proud of?” Like, having those not prepared, but like knowing in general what you want to say from those is very important when you’re thinking about interviewing for other jobs. But even in a big company, like the transfer process is, pretty similar for, like, applying externally to other roles; like sometimes there’s interviews—


Corey: Do they make you code on whiteboards to solve algorithm problems?


Seth: Not me. But—


Corey: Good.


Seth: —in general—


Corey: Google has evolved its interview process since the last time I went through that particular brand of corporate hazing. Good, good, good.


Seth: Yeah. The interview process has definitely been refactored a lot, especially with Covid and remote, but also just trying to be accessible to folks. I know one of the big changes Google has made is we no longer require, like, eight congruent hours of your time. You can split interviews out over multiple days, which has been really accommodating for folks that have, you know, already have a full-time job or have family obligations at home that don’t let them just, like, take eight hours away and devote a hundred percent of their time to interviews. So, I think that is, you know, not a whole lot of positive things that come out of Covid, but the flexibility with, like, interviewing has enabled more people to participate in the interview process that otherwise would not have been able to do so.


Corey: And there’s something to be said, for making this more accessible to folks who come from backgrounds that don’t all look identical. It’s incredibly important.


Seth: Yep.


Corey: One thing that I definitely want to make sure we get to before the end of this is something you’ve been talking about that’s a bit orthogonal, but maybe not entirely so, which is software supply chain security. That has been a common thread of discussion in some circles for a while. What is it, for those who are unfamiliar, like me sometimes, and what does it imply?


Seth: Yeah, so I mean, in the past year—but if you look back, you’ll find more cases of it—. We live in a world where no company—Google, Amazon, the US government—writes every line of code that they run. And even if you do, right, even if you could find a company that doesn’t rely on any external dependencies, what language are they using? Did they write that language? Okay, let’s say hypothetically, you write every single line of code and you wrote your own language, and only your employees contribute to that language.


What operating system are you running on? Because I guarantee you, Linus probably contributed to it, or Gates contributed to it, and they don’t work for you. But let’s say you wrote your own operating system, right—so we’re getting into, like, crazy Google things now, right? Like, only Google would write their own programming language and their own operating system, right? Who manufactured your CPU, right? Like, did you actually—


Corey: There’s always dependencies all the way down. We see this sometimes with companies talk about oh, yeah, we’re going to go to multiple clouds or a different clouds so that we don’t get impacted if there’s another AWS outage in us-east-1. Cool, great. Power to you, but are you sure your payment providers not going to go down? Are they taking a dependency on us-east-1?


Great, let’s say that they’re not. Are you sure that their vendors who are in the critical path are also not taking critical and core dependencies on that? And are you sure that they’re aware of who all of those critical dependencies and those vendors are, and so on and so forth? It is a vast interconnected web. This is a problem. Dependency sprawl is real and I don’t think that there’s a good way to get to the bottom of it, particularly across company boundaries like that.


Seth: Yeah. And this is where if you look at the non-software supply chain, like, if you look at construction, right? If you’re working with a reputable construction agency, they’re actually able to tell you, given a granite countertop or, you know, a quartz countertop, from what beach and what lot on what date the grains of sand in that countertop came from. That is a reality of that industry that is natural. You think about, like, automotive, like, VIN, the Vehicle Identification Numbers, like, they tell you exactly what manufacturer, and then there’s records that show you exactly what human being on the line put that particular part in that machine.


And we don’t have that in software today. Like, we have some, you know, bastardized versions of, like, Software Bills of Material, or SBOM, but the simple fact of the matter is like because software has grown organically and because this wasn’t ingrained in software from the beginning like it was from, you know, traditional manufacturing, you’re going to have an insecure software supply chain for most of my life. Now, what does that actually mean, right—insecure has this negative connotation—it means that you need to make sure that you’re aware of everything that you’re depending on—which is kind of what you were saying is, like, both the technical dependencies and the process or the people dependencies—and you need to have a rigorous process for how you’re going to respond to these incidents. And I think log4j was a really good eye-opening moment for folks when they realized that they didn’t have a way to make a large-scale dependency update across their entire fleet of applications.


Corey: Because who has to do that on a consistent basis? It happens rarely, but when it happens, it’s super important.


Seth: But I do think that more and more, we’re going to see it happened more and more frequently. And ideally, you know, my opinion is that we’re going to get to a point where this is inescapable, but ideally, we get to the point where it’s like, “Oh, okay, this dependency is vulnerable. I have a playbook. I follow the playbook. Everything is patched in 30 minutes or less, and I can move on with my life.” And it’s not a six-week fire drill with people working late and, you know, going super crazy, trying to mitigate these issues.


You know, there’s a lot of work happening in this space. We have, like, SLSA, which is an open standard—SLSA—for how you declare, kind of like, your software bill of materials and things like binary authorization and attestations. There’s, like, Sigstore, there’s Chainguard, there’s some companies evolving in this space. Every time I talk to GitHub, I tell them, I’m like, “Hey, if this VP and that VP, like, talked together and, like, worked on something, you could do something amazing in this space.” But I think it’s going to be quite a while until we get to a point where we can say the software supply chain is secure.


Because like I was saying at the beginning, like, until you manufacture your own CPU, like, you’re dependent on Intel and AMD. And until you write your own programming language, you’re dependent on Ruby, Python, Go, whatever it might be. And until you take no dependencies on some external system—which by the way, might be a bad business decision, like, if someone did the work for you already in an open-source ecosystem, it’s probably a better business decision to evaluate and use that than to build it yourself. Until we have the analysis on that supply chain, and we can in a dashboard, or the click of a button, or the run of a command, very easily see the security status of our supply chain—software supply chain—and determine if a particular vulnerability is or is not relevant, I think we’re still going to be in this firefighting mode for at least another couple of years.


Corey: And I want to say you’re wrong, but I know you’re not. And that’s what, I guess, keeps a lot of us awake at night for unfortunate reasons. Seth, I really want to thank you for taking the time to speak with me. If people want to learn more, where’s the best place to find you?


Seth: I’m on Twitter. You can find me at—


Corey: I’m sorry to hear that. So, am I. It’s the experience.


Seth: Yeah, you can find me at @sethvargo. If you say mean and hateful things to me, I actually exercise this finger, and you can click the block button real fast. But yeah, I mean, my DMs are open. If you have any questions, comments, complaints, concerns, you can throw the complaints away and come to me for everything else.


Corey: Thank you so much for being so generous with your time. I really appreciate it.


Seth: Yeah, thanks for having me. It’s always a pleasure.


Corey: Seth Vargo, engineer at Google. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment asking how dare I malign the good name of the other cloud provider that isn’t Google that also just so coincidentally happens to employ you.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


Announcer: This has been a HumblePod production. Stay humble.

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: The company 0x4447 builds products to increase standardization and security in AWS organizations. They do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain and fail-tolerant solutions, one of which is their VPN product built on top of the popular OpenVPN project which has no license restrictions; you are only limited by the network card in the instance.

Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I have a return guest today, though it barely feels like it qualifies because Seth Vargo was guest number three on this podcast. I’ve had a couple of folks on since then, and for better or worse, I’m no longer quite as scared of the microphone as I was back in those early days. Seth, thank you for joining me.

Seth: Yeah, thank you so much for having me back, Corey. Really excited to figure out whatever we’re talking about today.

Corey: Well, let’s start there because last time we spoke, you were if memory serves a developer advocate at Google Cloud.

Seth: Correct.

Corey: And you’ve changed jobs, but not companies—but kind of companies because, welcome to large environments—but over the past few years, you have remained at Google. You are no longer at Google Cloud and you’re no longer a developer advocate. In fact, your title is simply ‘Engineer at Google.’ And what you’ve been focusing on, to my understanding, is helping Alphabet companies, namely—you know, the Alphabet, always in parentheses in journalistic styles, Google’s parent company because no one thinks of it in terms of Alphabet—is—you’re effectively helping companies within the conglomerate umbrella securely and privately consume public cloud.

Seth: Yes, that is correct. So, I used to work in what we call the Cloud PA—PA stands for product area. Other product areas are like Chrome and Android—and I moved to the Core PA where I’m helping lead and run an initiative that, like you said, is to help Alphabet companies to, you know, securely and privately use public cloud services.

Corey: So, I am going to go out on a limb because my position on multi-cloud has always been pick a cloud—I don’t particularly care which one—but pick one and focus on that. I’m going to go out on a limb and presume that given that you are not at Google Cloud anymore, but you are at Google, you probably have a slight preference as far as which public cloud these various companies within the umbrella should be consuming.

Seth: Yeah. I mean, obviously, I think most viewers will think the answer is GCP. And if you said GCP, you would be, like, 95% correct.

Corey: Well, you’d also be slightly less than that correct, because they’re doing a whole rebrand and calling it Google Cloud in public, as opposed to GCP. You really don’t work for the same org anymore. You’re not up-to-date on the very latest messaging talking points.

Seth: I missed—ugh, there’s so many TLAs that you lose all your TLAs over time.

Corey: Oh, yes.

Seth: So, Google Cloud would be, like, 95% correct. But what you have to really understand is, Google has its own, you know, cloud—we didn’t call it a cloud at the time, you might call it on-prem or legacy infrastructure, if you will—primarily built on a scheduling system called Borg, which is like Kubernetes version zero. And a lot of the Alphabet companies have workloads that run onboard. So, we’re actually talking about hybrid cloud here, which, you know, you may not think of Google is like a hybrid cloud customer, but a workload that runs on our production infrastructure called Borg that needs to interact with a workload that runs on Google Cloud, that is hybrid cloud, it’s no different than a customer who has their own data center that needs peering to a public cloud provider, you know, whether that’s Google Cloud, or AWS, or Azure.

I think the other thing is if you look at, like, the regulatory space, particularly a lot of the Alphabet companies operate in, say, like healthcare, or finance, or FinTech, where certain countries and certain jurisdictions have regulations around, like, you must be multi-cloud. You know, some people might say that means you have to run, you know, the same instance of the same app across clouds, or some people say your data can be here, but your workloads can be over there. That’s to be interpreted, but you know, I would say 95% of GCP, but there is a—or sorry, 95% is Google Cloud—

Corey: There we go.

Seth: But there is a small percentage that is definitely going to be other cloud providers and hybrid cloud as well.

Corey: My position on multi-cloud has often—people like to throw it in my face of, “See you gave this general guidance, and therefore whenever you say something that goes against it, you’re a giant phony.” And it’s yeah, Twitter doesn’t do so well with the nuance. My position of pick a provider and go all-in is intended as general guidance for the common case. There are exceptions to this and any individual company or customer is going to have more context than that general guidance will. So, if you say you need to be in multiple clouds for certain reasons, you’re probably correct.

If you say you need to be in multiple clouds because your regulator demands it, you are certainly correct. I am not arguing against that in any way. I do want to disclaim my one of my biases here as well, and that is specifically that if I were building a startup today and I were not me—by which I mean having spent ten years in the AWS ecosystem learning, not just how it works, but how it breaks because that’s important in production, and you know, also having a bunch of service owners at AWS on speed dial—and I, were approaching this from the naive, I need to pick a cloud, which one would I go with, my bias is for Google Cloud. And the reason behind that is the developer experience is spectacular as the primary but not only perspective on that. So, I am curious to know that as you’re helping what are effectively internal customers move to Google Cloud, is their interaction with Google Cloud as a platform the same as it would be if I as a random outside customer, were using Google Cloud? Is there a bunch of internal backchannels? “Oh, you get the good kind of internal Google Cloud that most of us don’t get access to?” Or something else?

Seth: Yeah, so that’s a great question. So first, you know, thank you for the kind words on the developer experience—

Corey: They were honest words, to be clear. Let me be very direct with you, if I thought your developer experience was trash, I might not say it outright in their effort not to be, you know, actively antagonistic to someone I’m having on the show right now, but I would not say it if I didn’t believe it.

Seth: Yeah. And I totally—I know you, I’ve known you for many years. I totally believe you. But I do thank you for saying that because that was the team that I was on before this was largely responsible for that across the platform. But back to your original question around, like, what does the support experience look like? So, it’s a little bit of both.

So, Alphabet companies, they get a technical account manager, very similar to how, you know, reasonable-sized spend customer would get a technical account manager. That account manager has access to the Cloud support channels. So, all that looks the same. I think we’re things look a little bit different is because myself and some of our other leads came from Cloud, you know, I generally don’t like this phrase, but we know people. So, we tend not to go directly to Cloud when we can, right?

We want Alphabet companies to really behave and act as if they were an external entity, but we’re able to help the technical account manager navigate the support process a little bit better by saying like, “You need to ask for this person,” right? You need to say these words to get in front of the right person to get this ticket assigned to the right person. So, the process is still the same, but we’re able to leverage our pre-existing knowledge with Cloud. The same way, if you had a [unintelligible 00:07:45] or an ex-Googler who worked for your company, would be able to kind of help move that support process along a little bit faster.

Corey: I am quite sincere when I say that this is a problem that goes far beyond simply Google. A disturbing portion of my job as a cloud economist helping my clients consists of nothing other than introducing Amazonians to one another. And these are hard problems at scale. I work at a company with a dozen people in it. And it turns out that yeah, it’s pretty easy to navigate who’s responsible for what. When you have a hyperscale-size company in the trillion-dollar range, a lot of that breaks down super quickly.

Seth: And there’s just a lot of churn at all levels of the organization. And, you know, we talked about this when I first joined the show, like, I switched roles, I used to be in Cloud, and now I’m in what we call Core. I still get people who are reaching out to me, at Google and externally, who are saying, “Oh, can you answer this question? Hey, how do I do this?” And I, you know, I’ve gradually over the past couple of months, you know, convinced people that I don’t work on that anymore, and I try to be helpful where I can, but the—

Corey: You use the old name and everything. They’re eventually going to learn, right?

Seth: I know. They’ll be like, “What do you call this? GCP? Okay, great. We don’t need you anymore.” But it’s true, right? Like, there’s people leave the organization, people join the organization, there’s reorgs, there’s strategic changes, people, you know, switch roles within the org, and all of that leads to complexity with, you know, navigating, what is the size of a small nation, in some cases.

Corey: Your line in your biography says that you enable Alphabet companies to securely and privately consume public cloud. Now, that would make perfect sense and I would really have no further questions based on what we’ve already said, except for the words securely and privately, and I want to dive into that, first. Let’s work backwards with the second one first. What is ‘privately’ mean in this context?

Seth: So, privately means, like, privacy-preserving for both the Alphabet company and the users or customers that they have. So, when we look at that from the perspective of the Alphabet company, that means protecting their data from the eyes of the cloud provider. So, that’s things like customer-managed encryption keys, you know, bring-your-own-encryption, that’s making sure that you have things like, actually, transparency so that if at any point the cloud provider is accessing your data, even for a legitimate purpose, like submitting a support ticket or something—or diagnosing a support ticket, that you have visibility into that. Then the privacy-preserving side on the Alphabet company’s customers is about providing that same level of visibility to their customers as well as making sure that any data that they’re storing is, you know, private, it’s not accessible to certain parties, it’s following whether it’s like, you know, actual legislation around how long data can be persisted, things like GDPR, or if it’s just a general, like, data retention, insider risk management, all of that comes into this idea of, like, building a private system or privacy-preserving system.

Corey: Let’s be very clear that my position on it is that Google’s relationship with privacy has been somewhat challenged, in due to no small part to the sheer scale of how large Google has grown. And let’s be clear, I believe firmly that at certain points of scale, yeah, you deserve elevated levels of scrutiny. That is how we want society to function, by and large. And there are times where it feels a little odd on the cloud side. For example, as the time is recording, somewhat recently, there was a bug in some of the copyright detection stuff where Google Drive would start flagging files as having copyright challenges if they contained just the character ‘1’ in them.

Which, okay, clearly a bug, but it was a bit of a reminder for some folks that wait, but that’s right, Google does tend to scan these things. Well, when you have a bunch of end-user customers and in the ways that Google does, that stuff is baked in and it shapes how you wind up seeing things. From Amazon’s perspective, historically, they basically sold books and then later underpants. And doing e-commerce transactions was basically the extent of their data work with customers. They weren’t really running large-scale, file sharing systems and abilities—in collaboration suites, at least not that really had any of those pesky things called customers.

So, that is not built into their approach and their needs in the same way. To be clear, I am sympathetic to the problems, but it’s also… it’s a challenging problem, especially as you continue to evolve and move things into cloud, you absolutely must be able to trust your cloud provider, or you should not be working on that cloud provider, has been my approach.

Seth: Yeah, I mean, there’s certainly things that you can do to mitigate. But in general, like, there is some level of trust, forget the data, on the availability side, right? Like when the cloud provider says, “This is our SLA.” And you agree to that SLA, like, yeah, you get money back if they mess it up, but ultimately, you’re trusting them to adhere to that SLA, right? And you get recompense if they fail to do so, but that’s still, like, trust—trust is far more than just on the privacy side, right? It’s on… the promise on the roadmap, it’s on privacy, it’s on the SLA, right?

Corey: Yeah. And you see that concern expressed more articulately from enterprise customers, when there’s a matter of trusting companies to do what they say, such as the continued investment that Alphabet slash Google is making in Google Cloud. It’s easy to take the approach of well, you’ve turned off a bunch of consumer services, so therefore, you’re going to turn off the cloud at some point, too. No, let me be very clear, for the record, I do not believe that you are going to one day flip a switch and turn off Google Cloud. And neither do your customers.

Instead, the approach, the way that enterprises express this, it’s not about you flipping the switch and turning it off—that’s what contracts are for—their question, and they enshrine this in contracts, in some cases, in the event, not that you turn it off, but that you fail to appropriately continue to invest in the platform. Because at enterprise scale, this is how things tend to die. It is not through flipping a switch, in most cases, it’s through, “We’re just going to basically mothball it, keep it more or less exactly as it is until it slowly fades into irrelevance for a long period of time.” And when you’re providing the infrastructure to run things for serious institutions, that part isn’t okay. And credit where due, I have seen every indication that Google means it when they say this is an area of strategic and continued ongoing focus for us as a company.

Seth: Yeah, I mean, Google is heavily investing in cloud. I mean, this is a brand new group that I’m working in and we’re trying to get Alphabet companies onto cloud, so obviously there’s some very high-level top-down executive support for this. I will say that the—a hundred percent agree with everything you’re saying—the traditional enterprise approach of build this Java app—because let’s be honest, it’s always Java—build this Java app, compile it into a JAR and run it forever is becoming problematic. We saw this recently with, like, the log4j—

Corey: Yeah, to be in a container. What the hell?

Seth: [laugh].

Corey: I’m kidding. I’m kidding. Please don’t send me email, whatever you do.

Seth: What’s a container? I’m just kidding. Like, the idea of, like, software rotting is very real and it’s becoming more and more of a risk to security, to privacy, to public cloud providers, to enterprises, where when you see something like log4j happen and you can’t answer the question, like, do we have any code that uses that? Like, if getting the answer to that question takes you six weeks, [sigh] boy like, a lot of stuff can happen in six weeks while that particular thing is exploited. And you know, kind of gets into software supply chain a little bit, but I do agree that, like, secure, private, and stable APIs are super important, and it’s an area where Google is investing. At the same time, I think the industry is moving, the enterprise industry is moving away a little bit from set-it-and-forget-it as a strategy.

Corey: I want to talk about the security portion as well as far as securely consuming public cloud goes. And let me start off with a disclaimer here because I don’t want people to misconstrue what I’m about to say. If you are migrating to one of the big three cloud providers, their security will be better than anything you will be able to achieve as a company yourself. Not you personally because Google is a bit of an asterisk to that statement, given what you have been doing and have been doing since the ’90s in your on-prem world with Borg and the rest, but my philosophy on the relative positioning of the security of cloud providers relative to one another has changed. I spent four months beating the crap out of Azure forever having an issue where there was control plane access and then really saying nothing about it.

And after I wound up finding—the day after I put out a blog post on that topic because I was tired of the lack of response, it came out that right at the same time AWS had a very similar problem and had not said anything themselves. And they went back and forth, apparently waiting to wind up doing a release until this happened, Orca Security wound up putting one out there, and it was frustrating on a couple of levels. First, the people at both of these companies who work in security are stars. There is no argument, no bones about that. Problems are going to happen, things are going to occur as a result, and the only saving grace then is the transparency and communication around it, and there was none of it from them.

I’m also more than a little bit irked that my friends at AWS were aware of this, basically watched me drag Azure for four months knowing that they’d done the same thing and never bothered to say a word. But okay, that’s a choice. I’ve been saying for a while that of the big three, Google’s security posture is the most impressive. And it used to be a slight difference. Like, you nosed ahead of AWS in that respect, not by a huge margin, but by a bit.

I don’t think it’s nearly as close these days, in my mind, and talking to other large companies about these things, and people who are paid to worry about these things all day long, I am very far from alone in that perspective. So, I guess my question for you is, as you look at moving the workload securely to Google Cloud, it feels like security is baked into everything that all aspects of your company have done. Why is that a specific area of focus? Or is that how it gets baked into everything you folks do?

Seth: So, you kind of like set up the answer for this perfectly. I swear we didn’t talk about this extensively beforehand.

Corey: You didn’t know any of that was coming, by the way, just to be very clear here. I don’t sit here and feed, “All right, I’m going to say this. And here’s the right res—” No, this is an impromptu, more or less ad hoc show every time I do it.

Seth: Yeah. And I’m going to preface this by saying, like, I don’t want this to sound, like, egotistical, but I have never found a company that has as rigorous security and privacy policies, reviews, and procedures as Google.

Corey: I thought I had and I was wrong.

Seth: Yeah. And—

Corey: And I have a lot of apologizing to people to do as a result of that.

Seth: And honestly, every time I interact with our internal security engineering teams, or our IP protection teams, I’m that Nathan Fillion meme, where he’s like, what—you know, like, “Okay, I get it. I get it.” Right?

Corey: And then facepalm it, uh, I should say some—I can’t—yeah. Oh, yeah.

Seth: The reason that it’s hard for Alphabet companies to securely and privately move to cloud specifically for security, is because Alphabet’s stance is so much more rigorous than anyone else in the industry, to the point where, in some cases, even our own cloud provider doesn’t meet the bar for what we require for an internal workload. And that’s really what it comes down to is, like, the reason that Google is the most secure cloud is because our bar is so high that sometimes we can’t even meet it.

Corey: I have to assume that the correct answer on this is that you then wind up talking to those product teams and figure out how to get them to a point where they can support that bar because the alternative is effectively, it’s like, “Oh, yeah, this is Google Cloud and it’s absolutely right for multinational banks to use, but you know, not Google workloads. That stuff’s important.” And I don’t think that is necessarily how you folks tend to view these things.

Seth: So, it’s a bidirectional stream, right? So, a lot of it is working with a product management team to figure out where we can add these additional security properties into the system—I should say, tri-directional. The second area is where the policy is so specific to Google that Google should actually build its own layer on top of it that adds the security because it’s not generally applicable to even big, huge cloud customers. And then the third area is Google’s a very big company. Sometimes we didn’t write stuff down, and sometimes we have policies where no one can really articulate where that policy came from.

And something that’s new with this approach that we’re taking now is, like, we’re actually trying to figure out where that policy came from, and get at the impetus of what it was trying to protect against and make sure that it’s still applicable. And I don’t know if you’ve ever worked with governments or you know, large companies, right, they have this spreadsheet of hundreds of thousands of lines—

Corey: You are basically describing my client list. Please continue.

Seth: I mean, like, sometimes they have to use an Access database because they exhaust the number of rows in an Excel spreadsheet. And it’s just checklist upon checklist upon checklist. And that’s not how Google does security, right? Security is a very all-encompassing, kind of, 360 type of thing. But we do have policies that are difficult to articulate what they’re actually protecting against, and we are constantly re-evaluating those, and saying, like, “This made sense on Borg. Does it actually make sense on Cloud?” And in some cases, it may not. We get the same protections using, say, a GCP-native service, and we can omit that requirement for this particular workload.

Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.

Corey: I think that when it comes to things like policies that are intelligently crafted around security, you folks—and to be fair, the AWS security engineers as well—have been doing it right in that, okay, we’re going to build a security control to make sure that a thing can’t happen. That’s not enough. Then there’s the defense-in-depth. Okay, let’s say that control fails for some variety of ways. Here are the other things we’re going to do to prevent cross-account access, for example.

And that in turn, winds up continuing to feed on itself and build into a culture of assuming that you can always continue to invest in security. How far is enough? Well, for most folks, they haven’t gone far enough yet.

Seth: Another way to put this is like, how well do you want to sleep at night? You know, there’s folks on the Google security engineering team who are so smart, and they work on, like, our offensive security team, so their full-time job is to try to hack Google and then figure out how to prevent that. And, you know, so I’ve read some of the reports and some of the ways they think and I’m like, “How do you… how do you pick up a mobile phone and go to like, any website confidently knowing what you know?” Right? [laugh] and like, how do you—

Corey: Who said anything about confidently? Yeah.

Seth: Yeah. Yeah. How do you use self-checkout at a supermarket and, like, not just, like, wear your entire full-body tinfoil hat suit? But you know, I think the bigger risk is not knowing what the risks are. And this is a lot what we’re seeing in software supply chain, too, is a lot of security is around threat modeling and not checklists. But we tend to, like, gravitate toward checklists because they’re concrete.

But you really have to ask yourself, like, do I need the same security properties on my static blog website that is stored on an S3 bucket or a GCS bucket that’s public to the internet, that I do on my credit card processing service? And a lot of times we don’t treat those differently, we don’t apply a different threat model to them, and then everything has to have the same level of security.

Corey: And then everything is in-scope for whatever it is you’re trying to defend against. And that is a short path to madness.

Seth: Yes. Yes. Your static HTML files and your GCS bucket are in scope for SOC 1 and 2 because you didn’t have a way to say they weren’t.

Corey: Yeah. You’ve also done some—again, the nice thing about being at a company for a while—from what I can tell, given that I’ve never done until I started this place—is you move around and work on different projects. You were involved as well, personally, in the exposure notifications project, the joint collaboration thing between a number of companies in the somewhat early days of the pandemic that all of our phones talk to one another and anonymously and in a privacy-preserving way, let us know that hey, by the way, someone you were in close contact with has tested positive for Covid 19 in the previous fixed period of time. What did do you do over there?

Seth: Yeah, so the exposure notifications project was a joint effort, primarily between Apple and Google to use Android and iOS devices to help stop the spread of Covid or reduce the spread of Covid as much as possible. The idea being because the incubation period is roughly 14 days, at least pre-Omicron, if we could tell you hey, you might have been exposed and get you to stay at home for three or four days, self-isolate, we could dramatically reduce the spread of Covid. And we know from some of the studies that have come out of, like, the UK and European region that, like, the technology actually reduced the spread of cases by, like, fourteen-hundred percent in some cases. I was one of the tech leads for the server-side. So, the way the system works is it uses the low-energy Bluetooth on iOS and Android devices to basically broadcast random IDs.

So, I know this is Screaming into the Cloud, but if we can just quickly Screaming into the Void as a rebrand—

Corey: Oh, yeah.

Seth: —that’s basically what’s happening. [laugh]. You’re generating these random identifiers, and just, like, yelling them, and there’s other phones out there who are listening. And they collect these we’ll call RPIs—or Rolling Indicators. They have no data in them.

They’re like literally, like, a UUID or 32 bytes of random data, they aren’t at all, like, associated with your device or your person. So, then what happens is, like, let’s say you’re in a supermarket, you’re near someone for, you know, every so often, and your phones exchange these IDs. If you then test positive, those IDs go up to a centralized server, the server again, also has no idea who you are, so the whole thing is privacy-preserving, end-to-end, then the server basically bundles all of what we call the TEKs, or the Temporary Exposure Keys—into a tarball that go up onto a CDN, and then every night, all of the devices that are participating in EN download this into a local key match. So, at no point does the server ever know that you were in a supermarket with someone else, only your phone knows that you came in contact with this TEK in the past 14 days—or 21 days in some jurisdictions—and it’ll generate an exposure notification or an exposure alert, which says, like, “Hey, in the past 14 days, you’ve come in contact with someone who’s confirmed positive for Covid.” And then there’s guidance kind of varies by state and by health jurisdiction of, like, self-isolate, or go get tested, or whatever. But the idea—

Corey: Or go to the bar in some places, apparently.

Seth: Oh. Yeah. The server itself is actually—there’s a verification component because ideally, like, we don’t want people to just be like, oh, I’m Covid positive, and then like, all their friends get an alert, right? There needs to be some kind of verification mechanism where you either have a positive test, or you have a clinician or a physician who issues you code that you can put into your app so you can then release your keys. And then there’s the actual key server component, which I kind of already described.

So, it’s a pretty complex system and actually is entirely serverless. So, the whole thing, including all, like, background job processing, it was designed to be serverless from the beginning. Total greenfield project, right, like, nothing like this exists, so we’re really fortunate there. We made some fun and interesting design decisions to keep costs down while, you know, abusing slash using some of the features of serverless like auto-scaling and, you know, being able to fan out across multiple regions and things like that—

Corey: And using DNS as a database. My personal favorite approach to things?

Seth: We don’t use DNS as a database. We do use Postgres—

Corey: A missed opportunity.

Seth: —a real database. But we do use DNS, just not for storing information.

Corey: So, one question I have for you is that you’ve been at Google for a while and you’ve done an awful lot of things there, but previously, you’ve also done things that don’t really directly aligne any of this stuff going on there. You were at HashiCorp and you were at Chef, neither of whom, to my understanding are technologies that Google makes extensive use of internally for their own stuff. It seems like—and even when you’re at Google, you have been continually reinventing what it is that you do. I find that admirable because very often, when you see people at a company for a protracted period of time, they sort of get more or less pigeonholed into the role that looks fairly similar from year-to-year. You’ve been incredibly dynamic. Was it intentional and how do you do it?

Seth: So, I have a diagnosed medical condition called Career-DHD. I’m just kidding, but I do. I get bored, and it’s actually something that I’m really forward with my managers about. I’ve always been very straight with my managers and the people I work with it, like, 8 to 12 months from now, I will be doing something different. It will be different.

Corey: I wish I’d figured that out earlier on. In my case, the way that I wound up solving for that is I’ve got to come in, I’m going to solve a interesting problem. When I’m done with that, the consulting engagement is over and then I’m going to go away and everyone knows the score going in. Works out way better than, and then I’m going to go cause problems on purpose in other people’s parts of the org because I see problems there. That was where I always went off the rails.

Seth: [laugh]. Yeah, I mean, I don’t take a dissimilar approach. You know, I try to find high-priority, strategic things that also align with my interest. And it’s important to me that there’s things that I can provide and things that I can learn. I never like to be the smartest person in the room because you shouldn’t be in that room anymore; there’s no one for you to learn from. And it’s great to share knowledge, but—

Corey: I’m not convinced I’m the smartest person in the room right now, despite the fact that right now I’m the only person in the room that I’m sitting in.

Seth: I mean, that Minecraft store is pretty intelligent.

Corey: I saw Chihuahua wandering around here, too, a—

Seth: [laugh].

Corey: —minute ago, so there is that.

Seth: But, you know, I think from, like, a career advice standpoint, I tell everyone, you should interview somewhere else at least once a year. You never know what’s out there, and worst-case scenario, you kept your interview skills up to date.

Corey: Keeping those skills in tune is so critically important just because it’s a unique skill set that, for many folks, does not have a whole lot of applicability in their day-to-day job. So, if you suddenly have to find a new job, great, you’re rusty at this, it’s been years, and you’re trying to remember, like, okay, when someone asks you what you’re looking for in your next job, they’re not trying to pick a fight. Don’t respond as if they were. Like, the basic stuff. It’s a skill, like anything else.

Seth: Yeah. And, like, the common questions like, you know, “What do you want to do with your life?” Or like, “What accomplishment are you most proud of?” Like, having those not prepared, but like knowing in general what you want to say from those is very important when you’re thinking about interviewing for other jobs. But even in a big company, like the transfer process is, pretty similar for, like, applying externally to other roles; like sometimes there’s interviews—

Corey: Do they make you code on whiteboards to solve algorithm problems?

Seth: Not me. But—

Corey: Good.

Seth: —in general—

Corey: Google has evolved its interview process since the last time I went through that particular brand of corporate hazing. Good, good, good.

Seth: Yeah. The interview process has definitely been refactored a lot, especially with Covid and remote, but also just trying to be accessible to folks. I know one of the big changes Google has made is we no longer require, like, eight congruent hours of your time. You can split interviews out over multiple days, which has been really accommodating for folks that have, you know, already have a full-time job or have family obligations at home that don’t let them just, like, take eight hours away and devote a hundred percent of their time to interviews. So, I think that is, you know, not a whole lot of positive things that come out of Covid, but the flexibility with, like, interviewing has enabled more people to participate in the interview process that otherwise would not have been able to do so.

Corey: And there’s something to be said, for making this more accessible to folks who come from backgrounds that don’t all look identical. It’s incredibly important.

Seth: Yep.

Corey: One thing that I definitely want to make sure we get to before the end of this is something you’ve been talking about that’s a bit orthogonal, but maybe not entirely so, which is software supply chain security. That has been a common thread of discussion in some circles for a while. What is it, for those who are unfamiliar, like me sometimes, and what does it imply?

Seth: Yeah, so I mean, in the past year—but if you look back, you’ll find more cases of it—. We live in a world where no company—Google, Amazon, the US government—writes every line of code that they run. And even if you do, right, even if you could find a company that doesn’t rely on any external dependencies, what language are they using? Did they write that language? Okay, let’s say hypothetically, you write every single line of code and you wrote your own language, and only your employees contribute to that language.

What operating system are you running on? Because I guarantee you, Linus probably contributed to it, or Gates contributed to it, and they don’t work for you. But let’s say you wrote your own operating system, right—so we’re getting into, like, crazy Google things now, right? Like, only Google would write their own programming language and their own operating system, right? Who manufactured your CPU, right? Like, did you actually—

Corey: There’s always dependencies all the way down. We see this sometimes with companies talk about oh, yeah, we’re going to go to multiple clouds or a different clouds so that we don’t get impacted if there’s another AWS outage in us-east-1. Cool, great. Power to you, but are you sure your payment providers not going to go down? Are they taking a dependency on us-east-1?

Great, let’s say that they’re not. Are you sure that their vendors who are in the critical path are also not taking critical and core dependencies on that? And are you sure that they’re aware of who all of those critical dependencies and those vendors are, and so on and so forth? It is a vast interconnected web. This is a problem. Dependency sprawl is real and I don’t think that there’s a good way to get to the bottom of it, particularly across company boundaries like that.

Seth: Yeah. And this is where if you look at the non-software supply chain, like, if you look at construction, right? If you’re working with a reputable construction agency, they’re actually able to tell you, given a granite countertop or, you know, a quartz countertop, from what beach and what lot on what date the grains of sand in that countertop came from. That is a reality of that industry that is natural. You think about, like, automotive, like, VIN, the Vehicle Identification Numbers, like, they tell you exactly what manufacturer, and then there’s records that show you exactly what human being on the line put that particular part in that machine.

And we don’t have that in software today. Like, we have some, you know, bastardized versions of, like, Software Bills of Material, or SBOM, but the simple fact of the matter is like because software has grown organically and because this wasn’t ingrained in software from the beginning like it was from, you know, traditional manufacturing, you’re going to have an insecure software supply chain for most of my life. Now, what does that actually mean, right—insecure has this negative connotation—it means that you need to make sure that you’re aware of everything that you’re depending on—which is kind of what you were saying is, like, both the technical dependencies and the process or the people dependencies—and you need to have a rigorous process for how you’re going to respond to these incidents. And I think log4j was a really good eye-opening moment for folks when they realized that they didn’t have a way to make a large-scale dependency update across their entire fleet of applications.

Corey: Because who has to do that on a consistent basis? It happens rarely, but when it happens, it’s super important.

Seth: But I do think that more and more, we’re going to see it happened more and more frequently. And ideally, you know, my opinion is that we’re going to get to a point where this is inescapable, but ideally, we get to the point where it’s like, “Oh, okay, this dependency is vulnerable. I have a playbook. I follow the playbook. Everything is patched in 30 minutes or less, and I can move on with my life.” And it’s not a six-week fire drill with people working late and, you know, going super crazy, trying to mitigate these issues.

You know, there’s a lot of work happening in this space. We have, like, SLSA, which is an open standard—SLSA—for how you declare, kind of like, your software bill of materials and things like binary authorization and attestations. There’s, like, Sigstore, there’s Chainguard, there’s some companies evolving in this space. Every time I talk to GitHub, I tell them, I’m like, “Hey, if this VP and that VP, like, talked together and, like, worked on something, you could do something amazing in this space.” But I think it’s going to be quite a while until we get to a point where we can say the software supply chain is secure.

Because like I was saying at the beginning, like, until you manufacture your own CPU, like, you’re dependent on Intel and AMD. And until you write your own programming language, you’re dependent on Ruby, Python, Go, whatever it might be. And until you take no dependencies on some external system—which by the way, might be a bad business decision, like, if someone did the work for you already in an open-source ecosystem, it’s probably a better business decision to evaluate and use that than to build it yourself. Until we have the analysis on that supply chain, and we can in a dashboard, or the click of a button, or the run of a command, very easily see the security status of our supply chain—software supply chain—and determine if a particular vulnerability is or is not relevant, I think we’re still going to be in this firefighting mode for at least another couple of years.

Corey: And I want to say you’re wrong, but I know you’re not. And that’s what, I guess, keeps a lot of us awake at night for unfortunate reasons. Seth, I really want to thank you for taking the time to speak with me. If people want to learn more, where’s the best place to find you?

Seth: I’m on Twitter. You can find me at—

Corey: I’m sorry to hear that. So, am I. It’s the experience.

Seth: Yeah, you can find me at @sethvargo. If you say mean and hateful things to me, I actually exercise this finger, and you can click the block button real fast. But yeah, I mean, my DMs are open. If you have any questions, comments, complaints, concerns, you can throw the complaints away and come to me for everything else.

Corey: Thank you so much for being so generous with your time. I really appreciate it.

Seth: Yeah, thanks for having me. It’s always a pleasure.

Corey: Seth Vargo, engineer at Google. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment asking how dare I malign the good name of the other cloud provider that isn’t Google that also just so coincidentally happens to employ you.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.