Writing Better Code to Optimyze Cloud Spend with Thomas Dullien

Episode Summary

Thomas Dullien (Halvar Flake) is the co-founder of optimyze, a company that helps businesses optimize their cloud spend with better code. He started his career by founding a company called zynamics, a research-centric technology company that was acquired by Google in 2011. After the acquisition, he stayed on at Google as a staff engineer for eight years before launching optimyze. Join Corey and Thomas as they discuss why cloud optimization is increasingly important in a SaaS-driven world, why Thomas believes that cloud costs can be reduced by optimizing code, how rewriting code the way Google wants means your app can scale to the sky immediately, the difference between working on Google’s internal infrastructure and GCP, how Google hasn’t traditionally been good at explaining why their products are beneficial, why you should treat a data center as a computer that happens to be the size of the warehouse, Google Project Zero, and more.

Episode Show Notes & Transcript

About Thomas Dullien
Thomas Dullien / Halvar Flake is a security researcher / entrepreneur known for his contributions to the theory and practice of vulnerability development and software reverse engineering. He built and ran a company for reverse engineering tools that got acquired by Google; he also worked on a wide range of topics - like turning security patches into attacks turning physics-induced DRAM bitflips into useful attacks. After a few years of Google Project Zero, he is now co-founder of a startup called http://optimyze.cloud that focuses on efficient computation -- helping companies save money by wasting fewer cycles, and helping reduce energy waste in the process.


Links Referenced


Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by our friends at Linode. You might be familiar with Linode; I mean, they've been around for almost 20 years. They offer Cloud in a way that makes sense rather than a way that is actively ridiculous by trying to throw everything at a wall and see what sticks. Their pricing winds up being a lot more transparent—not to mention lower—their performance kicks the crap out of most other things in this space, and—my personal favorite—whenever you call them for support, you'll get a human who's empowered to fix whatever it is that's giving you trouble. Visit linode.com/screaminginthecloud to learn more and get $100 in credit to kick the tires. That's linode.com/screaminginthecloud.


Corey: This episode has been sponsored in part by our friends at Veeam. Are you tired of juggling the cost of AWS backups and recovery with your SLAs? Quit the circus act and check out Veeam. Their AWS backup and recovery solution is made to save you money—not that that’s the primary goal, mind you—while also protecting your data properly. They’re letting you protect 10 instances for free with no time limits, so test it out now. You can even find them on the AWS Marketplace at snark.cloud/backitup. Wait? Did I just endorse something on the AWS Marketplace? Wonder of wonders, I did. Look, you don’t care about backups, you care about restores, and despite the fact that multi-cloud is a dumb strategy, it’s also a realistic reality, so make sure that you’re backing up data from everywhere with a single unified point of view. Check them out as snark.cloud/backitup


Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Thomas Dullien, the CEO of optimyze.cloud. That's optimyze, O-P-T-I-M-Y-Z-E, so we know it's a startup. Thomas, welcome to the show.


Thomas: Hey, nice to be here. Thank you.


Corey: Of course. So, let's start with my, I guess, snarky comment at the beginning here, which is yeah, you misspell a word so clearly you're a startup. You have a trendy domain, in this case, dot cloud, which is great. What do you folks do as a company?


Thomas: Well, originally, we set out to try to help people reduce their cloud bill by taking a bit of an unorthodox approach.


Corey: That sounds like a familiar story.


Thomas: Well, familiar perhaps, but our approach was that I had seen just a tremendous amount of wasteful computation everywhere. And the hypothesis behind our company was, “Hey, with Moore's Law ending, software efficiency will become important again.” Meaning people will actually care about software being efficient. And the reason for this is A) Moore's law is ending, and the second reason is, now that everybody is a SaaS vendor, instead of a software vendor, all of a sudden the software vendor pays for the inefficiency. Like, in the past, if you bought a copy of Photoshop, you had to buy a new Macintosh along with it. Nowadays, it's the vendor for the software that actually pays for it. So, our entire hypothesis was that there's got to be a way to optimize code and then make things run faster and, yeah, make people happy doing this.


Corey: One of the strange things that I find is that every time I talked to a company who's involved in the cloud cost optimization space—and again, full disclosure, I work at The Duckbill Group. That's what we do on the AWS side. And we take a sort of marriage-counseling-based approach for finance and engineering so they stop talking past each other. And that's all well and good, but that's a relatively standard services story: part tools, part services-based consultancy, and it has its own appeal and drawbacks, of course. 


What I find interesting is that most tooling companies always do what comes down to be more or less the same ridiculous thing, which is, “Ah, the dashboards are crappy so we're going to build a better dashboard.” “Great, awesome. What problem are you actually solving?” And it becomes, “Well, we don't like Cost Explorer, so here's something else that we did in Kibana instead,” or whatnot. Great. I don't necessarily see that that solves the customer pain points. You have done something very odd in that view, which is that you're not building a restated dashboard of what people will already find from native tools, you're looking at one very specific area. What is it?


Thomas: Yeah, so in our case, we're really looking at where are people spending their computational cycles. I mean, everybody knows that you can profile a single application on their laptop or on their computer, but then once you get to a certain scale, that gets really, really weird. And I like to think about software in the sense that we're building essentially an operating system for an entire data center these days. Like that's really what's happening inside Google; that's what's happening inside AWS. And you need to measure where your time is spent if you want to make things faster. 


Everybody who's profiled their own software usually find some really low hanging fruit, and then goes and fixes them. And to some extent, we have a huge disconnect these days between what the developer is writing and what is actually the feedback loop to tell the developer, “Hey, this change here just caused X dollars of extra cost.” So, in some sense, what we want to build something that tells the developer, this line of code here is generating this amount of cost. And we kind of think that developers will make better decisions once they know what the cost is. Full disclosure, I used to work at Google for a fairly long time, and Google wrote a paper in 2010 about a system they have built that's called the Google-wide profiler, and the results of that system inside Google were quite hilarious. Like, they figured out that they're spending, what, 15 percent of all cycles inside Google on gzip when they first started measuring. So, we thought that’s got to be a useful thing for other people to have, too.


Corey: And I'm a mixed opinion on that one because, first, congratulations on working for Google, all things must end, and you've since left Google, and they’re Google, so they're really good at turning things off. But that's beside the point. This, on some level, might feel like it's a problem that a company like Google will experience, where optimizing their code to make it more cost-effective makes an awful lot of sense, given how they operate and what they do. But then I look at customers I work with where they have massive cloud bills, but fleet-wide, their utilization averages to something like 7 percent. 


So, these instances are often sitting there, very bored. Optimizing that codebase to run more efficiently saves them approximately zero dollars, in most cases. That is, in many ways, the reality of large scale companies that are not, you know, the hyperscaling Googles of the world. That said, I can see a use case for this sort of thing when you have specific scaled out workloads that are highly optimized, and being able to tweak them for better performance stories starts to be something that adds serious value.


Thomas: Yeah, I mean, I won't even dispute that assessment. There's so much waste in the Cloud these days. And it starts from underutilized machines, it starts from data that's not compressed, and that just sits there; nobody's ever going to touch it again. And I won't claim that everybody will have the problem of needing to optimize their code. That's quite plainly not the case. 


Our calculus is pretty much Google needed to build a system like this internally. Facebook needed to build a system like this internally. At some point, there is a SaaS business of a certain size, where you actually want to know, hey, where is all my money going, and you want to enable your engineers to make those decisions in a smarter way. So, I guess the other side is, I looked at my own skill set and try to figure out, what can I do? Where is my skill set a good match for helping people save money in the Cloud? It turns out, it's not in writing a better dashboard, and it's not necessarily in right-sizing, either. So, I guess I took what I knew how to do and figured out [laughs] how to apply it if that makes sense.


Corey: That's, I think, a great approach. The challenge I always see is in translating things that work for Google—for example—into, I guess, the public world, where I would argue that in the early days, this was one of the big stumbling blocks that GCP ran into—and I don't know how well this was ever communicated, or if it's just my own weird perception—but it feels like in a lot of ways, Google took what it learned from running its own massive global infrastructure, and then turned it into a Cloud. The problem is that when you're internal at Google and running that infrastructure, you can dictate how software has to behave in order to be supported in that environment. With customer workloads, you absolutely cannot do that in any meaningful way. So, it feels like the more Google-y your software was, the better it would run on GCP. But the more—well, let's be very direct—the more it looked like something you'd find in a bank, the less likely to find success it was. And that's changed to a large degree as the product itself has evolved, but is that completely out to sea? Is that an assessment that you would agree with? Or am I completely missing something?


Thomas: I think you're kind of right, not entirely right. I think you're completely correct in that Google internally, you need to rewrite things in a very specific manner in order for them to work. And my personal experience there is, I ran a small reverse engineering company from 2004 to 2011 and we got acquired by Google in 2011. And then we had to take an existing infrastructure that we had and port it to Google's infrastructure—to the internal infrastructure, not to the Google Cloud infrastructure because back then Google Cloud was just App Engine, which was not—I mean, it looks like ahead of its time now that everybody's talking about Function as a Service, but back then App Engine just looked weird. So, we had to rewrite pretty much our entire stack to conform to Google's internal requirements. 


And it's a super weird environment because once you do rewrite it in the way that Google wants, everything scales to the sky, immediately. Like, the number of cores essentially becomes a command-line parameter. It’s like make -j16, but you replace 16 with 22,000 and you have 22,000 cores doing your work. Now, that said, GCP never externalized these internal systems, so what you get on GCP, to my great frustration, as an ex-Googler is often a not so great approximation of the internal infrastructure. [laughs]. 


So, it's neither here nor there, but first of all, I completely agree that Google, historically, in the Cloud has had the problem that they had learned a lot of lessons internally from scaling, and were terrible at communicating these lessons properly to the outside world, and then gave people something that wasn't well explained. App Engine is a great example for this, right? Because if you encounter App Engine for the first time in 2011, and you look at this and you’re like, “Why, how, and what is this?”


Corey: It was brilliant in some ways, don't get me wrong. But privately, I always sort of viewed Google App Engine as, “Cool, we're going to put this thing up and see who develops things super well with it, then we're going to make job offers to those people.” It felt like it was more or less a project provided by Google recruiting.


Thomas: I wouldn't know about that. And having been involved in interviewing, I think you're overestimating Google's recruiting progress. But that said, I mean, App Engine is a really interesting concept, but the average developer does not—at least, the average developer in 2011—does not appreciate what it does and why it does these things. And you can talk about Borg and Kubernetes as well, where Google just didn't explain very well why they decided to build things the way they did when they externalized similar services on GCP. And that's certainly hurt them. 


If you look at the history of AWS versus Google, Google gave people something like App Engine, which was weird and strange, and for a particular use case, and not terribly well explained, and AWS gave people VMs, which people understood. And by and large, if I am going to choose a product, and one looks really strange and one looks familiar, I'm going to take the familiar product. And I think the strategic failing that Google did historically in Cloud was when they did have a technically superior solution, they did a very poor job at explaining why this is a technically superior solution. So, in some sense, Google never had a—historically Google didn't have a culture of customer interaction and, to some extent, what you need to do in Cloud is you need to reach out to people, take them by the hand, calm the nerves, and then help them walk to the Cloud, and Google just didn't do that. They gave people strange-looking things and told them, “Hey, this is the better way but we don't tell you why.”


Corey: This, of course, feels like a microcosm for Kubernetes. If I'm going to continue my bad take—and I absolutely will: when you're wrong, by all means, double down on it—it feels like Kubernetes being rolled out was an effort to get the larger ecosystem to write code and deploy it in ways that were slightly more aligned with Google's view of the world. And credit where due it worked. The entire world is going head over heels for Kubernetes.


Thomas: Yeah. Well, Kubernetes is an interesting thing to watch because I'm a huge fan of Google's internal system called Borg. And—


Corey: Oh, yes.


Thomas: —there's a philosophical view at play, right. And the philosophical view is that you really shouldn't treat a data center as a group of computers, you should treat a data center as a computer that happens to be the size of a warehouse. Urs Hölzle wrote a book called Warehouse Sized Computing and a lot of Google's internal engineering philosophy is centered around this thing that we really should be building something that treats an entire data center as if it was one computer. And that's actually a very compelling viewpoint and I think it's a very good viewpoint to take. And then they built Borg and Borg worked brilliantly internally for that purpose.


Corey: Well, there was a great tweet that you wrote back in April, “The trouble with Google's infra is sometimes you just want a slice of bread, but the only thing available is the continental saw, normally used for cutting continents and the 2000 page manual.”


Thomas: Yeah.


Corey: That's what Borg is for. Not everyone needs that level of complexity and scale to deploy, you know, a blog.


Thomas: No, that's entirely true and entirely fair. I guess if you're running a SaaS business of some size—and with some size, where, let's say, speaking about when you need 20, 30, 50 servers—you probably want to have some way of administering these and we've all been in the trenches enough to know that administering 50 machines becomes a bit of a nightmare very quickly. So, I think the view of, we really should be treating those 50 machines as if they were one big machine, that's still a good view, even if you're not Google. To be fair, if you're running a blog or—like, 99 percent of all workflows really don't need to be distributed systems. That's the reality of it. 


We've had 40 years of Moore's Law. A single cell phone can do fairly amazing things if you think about the sheer computing power we have there. So, I fully agree that in the majority of cases you don't need the complexity, and good engineering usually means keeping things simple. And you do pay a price in complexity for insane scalability. And I stand by the tweet about the intercontinental saw because what happened so often when I was at Google, that they had these fantastically scalable systems, but it took a long while to wrap your head around how to even use them, and you really just wanted to cut a slice of bread.


Corey: That's one of the real problems is that scale means different things to different people, all the time. And, from my perspective, “Oh, wow, I have a blog post that got an awful lot of hits,” that might mean, I don't know, in the first 24 hours it goes up, it gets 80,000 clicks. That's great and all. Then you look at something that Google launches. It doesn't even matter what it is, but because they're Google, and they have a brand, and people want to get a look at whatever it is they launched before it gets deprecated 20 minutes later, it'll wind up getting 20 million hits in the first hour that it's up. It's a radically different sense of scale, and there's a very different model that ties into it. And understand that Google's built some amazing stuff, but none of the stuff that they've built that powers their own stuff is really designed for small scale because they don't do anything small scale.


Thomas: Oh, that's entirely true. Now, to counter that point a little bit, though, I would argue that… I mean, if there's one thing to be learned from Gangnam Style is that the strangest things can go viral these days, and you may find yourself in a position where you need to scale rapidly within 24 hours or even shorter timeframes. If whatever you're offering gets to be insanely popular and spreads through social media. Because the reality is, like, in the year 2000, if your software got popular, you noticed that it was sold out in stores, right? You could produce more, and that was fine. 


But the reality of today is, you may be in a situation where you need to scale really rapidly, and then it may be good if you've built things in a way that they can scale. Now, I'm not advocating that you should always pay the complexity price of making things scalable. I'm just saying that scalability may be more important today from a business perspective than it was a couple of years ago, just because, especially with Software as a Service, people switch things around a lot, people try things out, and it's quite possible that just randomly you get 10 million hits on your service the first day, and then you probably don't want to show the famous fail whale that Twitter was so famous for.


This episode is sponsored by our friends at New Relic. Look, you’ve got a complex architecture because they’re all complicated. Monitoring it takes a dozen different tools. Troubleshooting means jumping between all those dashboards and various silos. New Relic wants to change that, and they’re doing the right things. They’re giving you one user and a hundred gigabytes a month, completely free. Take the time to check them out at newrelic.com, where they’ve done away with almost everything that we used to hate about New Relic. Once again, that’s newrelic.com.


Corey: Oh, yeah. And remember that there's an argument to be made for reliability and when it begins to make serious business sense. You can wind up refactoring your existing code that has no customers until you run out of money, but even bad code that doesn't scale super well—like the Twitter fail whale—can get you to a point where you can afford a team of incredibly gifted people that come in and fix your problems for you. There's validity there, but early optimization becomes a problem. The things that I would write if I'm trying to target 20 million active users versus half an active user at any given point in time—because who really pays attention to me?—would be a very different architectural pattern for the most part.


Thomas: Yeah. I don't disagree. Then again, I think the entire selling point of App Engine back in the days was, you just write this thing in the way that App Engine tells you to do and then, whatever happens, you're insured, right? But yeah, I fully agree. I mean, there's the argument to be made that once you have traction, you also have money to fix the scaling issues, but then the question becomes, can you fix those quickly enough, so people don't get turned off by the unreliability? And that's not a question I can—anybody can answer in any good way upfront because you have to try things, and they'll fail and so forth.


Corey: So, what's interesting to me is that you don't come from a cost optimization background, historically. In fact, you come from one of the more interesting things on the internet—which is fascinating to me, at least—which is Google's Project Zero. And for those who haven't heard of it, what is Project Zero?


Thomas: So, Project Zero is a Google internal team that tries to emulate government attackers, essentially, trying to find vulnerabilities in critical software and by emulating the thought process, tries to nudge the industry in the direction of… well, making better security decisions and fixing the glaring issues. And it arose from Google's experience in 2009 where the Chinese government attacked them and used a bunch of vulnerabilities. And then Google at some point, a couple years later decided, “Hey, we've got all these people on the offensive side being paid to find vulnerabilities and then sell them to governments to hack everybody. Why don't we start a team internally that tries to do the same thing, but then publishes all the techniques, and publishes all the learnings, and so forth, so that the industry can be better informed?” In some sense, the observation was that the defensive side often made poor decisions by being not well-informed about how attacks actually work. And if you don't really understand how a modern attack works, you may misapply your resources. So, the thought process was, let's shine a light on how these things work so the defensive side can make better decisions.


Corey: One of the things I find neat about it—this is of course where Tavis Ormandy works—


Thomas: Yes.


Corey: —and it's fun talking to him on Twitter watching him do these various things. Every time he's like, “Hey, can someone at some random company reach out to me on the security contact side?” It's, “Ooh, this is going to be good.” And everyone likes to gather around because it's one of those rare moments where you get to rubberneck at a car wreck before the accident.


Thomas: Yeah, Tavis is a—I have an extremely high opinion of Tavis. He's a person with great personal integrity, and he's a lot of fun to discuss with, and he’s got a really good intuition where things break. So, in general, the entire experience of having worked at Project Zero was pretty great. I spent a grand total of eight years at Google, five of which in a team that did some malware related stuff, and then two years in Project Zero. And the two years in Project Zero were certainly a fantastic experience.


Corey: The thing that I find most interesting is that you have these almost celebrity bug hunters, for lack of a better term, and what amazes me is how many people freaking seem to hate him. And you do a little digging and, oh, you work at a company that had a massive vulnerability that was disclosed and, hm, one wonders why you have this axe to grind. It's, again… in some levels, it’s people doing you a favor. I've never fully understood aspects of blaming people who point out your vulnerabilities to you in a responsible way? Sure, I know you would prefer that they tell you and never tell anyone else, and you owe them maybe a T-shirt at most. Some of us aren't quite that, I guess, willing to accept that price point for our skill sets.


Thomas: Yeah, so the entire vulnerability disclosure debate is a very complicated and deep one, and it also goes in circles over decades. And it's actually quite tiring after 20 years of going through the same cycle; it feels like Groundhog Day. But my personal view is that to some extent, the software industry incurs risks on behalf of their users in order to make a profit, meaning you gather user data, you store it somewhere, and you can, well move fast and break things, and nothing much will happen if that user data gets leaked, and so forth. So, the incentives in the software industry are usually towards more complexity, more features, and bad security architecture. 


And because there's no—I mean, there's no software liability, there's no recourse for wider society against the risks that the software industry takes on behalf of the users, the only thing that may happen is that you get an egg on your face because somebody finds a really embarrassing vulnerability and then writes a blog about it. So, in some sense, Project Zero and the people that work at Project Zero, they wouldn't be doing their job if everybody loved them because to some extent, their job is to be an incentive to actually care. If people say, “Oh, let's do a proper security architecture, otherwise Tavis would tweet at us.” That's at least some incentive [laughs] to have security. It sounds a bit sad, but this is the only thing that—not the only thing, but this is a thing that is necessary. But part of the job of being a Project Zero researcher is not to be everybody's best friend if that makes some sense.


Corey: Yeah. Security is always a weird argument. I started my career dabbling in it and got out of it because frankly, the InfoSec community is a toxic shithole. Yes, I did say that; you did not mishear if you're listening to this and take exception to what I just said. I said what I said. 


It's such an off-putting community where it was very clear that the folks who are new to the field were not welcome, so I found places to go where learning how this stuff works was met with encouragement rather than derision. That may have changed since I was in the space. It's been, what, nearly 15 years, but I'm not so sure about that.


Thomas: So, I wouldn't know, right? Because I grew into that community in Europe 20 years ago. And the community I grew into 20 years ago in Europe was a very different community from the community I encountered when I first came to the US and interacted with the US InfoSec community. And also, you tolerate a lot of behavior when you're 16 and you want to be part of a community that you wouldn't tolerate as an adult. So, I'm not sure whether I would have, like, a very clear view on these topics, right? 


Because the other thing is, once you reach some level of status, everybody's incredibly nice to you all the time. And at least my experience in security after I turned 18 or 19, was that people were by and large, more friendly than justified to me. Now, that doesn't mean they weren't shitty to everybody else at the same time, right? So, the reality is that I have a skewed view of the security community because I got really lucky if that makes any sense. And then also, I guess I'm kind of picky about who I surround myself with, so the two dozen or so people that are really like, out of the greater security community may just not have that same culture if that makes any sense.


Corey: So, help me understand your personal journey on this. You went from focusing on InfoSec to cloud cost optimization. I have my own thoughts on that, and personally, I think that they're definitely aligned from the right point of view. But I'm curious to hear your story. How did you go from where you were to where you are?


Thomas: Yeah. So, we can call it, perhaps, a midlife crisis of sorts, but the background is, after 20 years of security, you realize security is always, at some level, about the human conflict. It’s always—you do security for somebody against somebody, in some sense; you're securing something against somebody else. And it's very—well, I wouldn't say repetitive, but it's certainly a very difficult job, and at some point, I asked myself, “Why am I doing this? And for what am I doing this?” 


And I realized, hey, perhaps I want to do something that has a positive externality. Like, I don't necessarily want to participate in human to human conflict all the time. And I realized that my only credible chance of dying with a negative CO2 balance [laughs], or budget would be to help people compute more efficiently, right? Like, there's no amount of no meat-eating, no car driving that I can do that will erase all the CO2 I've emitted so far, but if I can help people compute more efficiently, then I can actually have a positive impact. There's a triple win to be had: if I do my work well, the customer saves money, I earn some money, and in the meantime, I reduced human wastefulness. So, that had a great appeal on a philosophical basis. 


And then, the other thing I realized is that when you do security work, a lot of your work is reading existing legacy code and finding problems, and then have people mad at you because you found the problems in the legacy code, and now they need to be fixed. And it turns out that when it comes to optimization, the workflow is surprisingly similar. The skills you need in terms of lower-level machine stuff, and so forth, it's also surprisingly similar. And if you find the thing to optimize, anything to fix, then people are actually thankful because you are saving money and make things faster. So, it turned out that this was pretty much a match that worked out surprisingly well. And yeah, then that's how I made the jump. 


And I have to admit, so far, I really haven't regretted it. The technical problems are super fascinating. To some extent, there's less politics, even, in the cost optimization area because one of the issues with security is, on the defensive side, a lot of good security work is about convincing an organization to change the way they're doing things. So, a lot of good defensive work is actually political in nature. And the purely technical geeks in security are often in some form of offensive role. 


For example, the Project Zero stuff, that's offensive for the defensive side, but still, it's a sort of offensive role. And then the majority of these jobs are just in companies that sell exploits to governments. And given that my forte happens to be more on the technical side than on the influencing an entire organization side, I decided that—and given that I didn't want to do offensive work in that sense anymore—I decided that this entire cost optimization thing has the beautiful property of aligning good technical work with an actual business case.


Corey: There's an awful lot of value there to aligning whatever you're doing with business case, I would argue that security and cost optimization are absolutely aligned from a basis of cloud governance. Of course, now, here in reality, don't call it that because no one wants to deal with governance, and it always means something awful, just from a different axis, depending upon who you talk to. But that's the painful part is that there's no great answer around how to solve for these problems. What always confuses me—and annoys me on some level—is when I have a cost project that accidentally turns into a security project, where it’s, “So, tell me about those instances running in that region on the other side of the world.” “Oh, we don't have anything there.” “I believe you are being sincere when you say that. However, the bill doesn't lie.” And suddenly we're in the middle of an incident.


Thomas: It's funny that you mentioned this because the number of security incidents that have been uncovered by billing discrepancies is large. If you go back to Cuckoo's Egg, like Clifford Stoll’s story about finding a bunch of KGB finance german hackers in the DOD networks, that was initially triggered by an accounting discrepancy of I think, 25 cents or something like this. So, yeah, the interesting thing about IT security is, for banking, for example, is if you steal data, nobody normally knows because normally data isn't accounted for properly, except when you cause large data transfer fees because you're exfiltrating too much data out of AWS.


Corey: Yeah. That's always fun when that happens. What’s surprising to me, and that makes perfect sense in hindsight, if you have a $75 AWS account every month and suddenly you get a $60,000 bill, you sort of notice that. But if you wind up getting compromised when you're spending, let's say, $10 million a month, it takes an awful lot of bitcoin mining before that even begins to make a dent in the bill. At some point, it just disappears into the background noise.


Thomas: Oh, yeah, definitely. But I guess that's always the case. If you look at a supermarket, they don't notice half of the shoplifting, right?


Corey: Yeah. Supposedly, anyway. I don't know. I tend to not spend most of my time shoplifting. I usually set my eyes on bigger game, you know, by exfiltrating [00:30:37 unintelligible] data from people's open S3 buckets.


Thomas: Isn't even exfiltrating of S3 buckets old? [laughs].


Corey: No, I've decided that people don't respond to polite notes about those things. Instead, I just copy a whole bunch of data into those open buckets on the theory that while they might ignore my polite note, they probably won't ignore a $4 million dollar bill surprise.


Thomas: That's actually a fairly effective-sounding strategy.


Corey: It's funny—let's be very clear here. I'm almost certain that that could be construed by an aggressive attorney as a felony. And let's not kid ourselves; if you cost a company $4 million their attorneys will always be aggressive. This is not legal advice. Don't touch things you don't own. Please consult someone who knows what they're doing. It's not me. Have I successfully disclaimed enough responsibility? Probably not, but we're going to roll with it.


Thomas: All right.


Corey: So, if people want to hear more about what you're up to, how your journey is progressing, or hear your wry but incredibly astute observations on this ridiculous industry in which we find ourselves, where can they find you?


Thomas: Well, one option is clearly on Twitter. I run a Twitter account under twitter.com/halvarflake. H-A-L-V-A-R-F-L-A-K-E. And that is not only about my professional work, I do have a fairly unfiltered Twitter account. Like, there's nobody ghostwriting my tweets, and there's oftentimes things that I tweet that I regret a day later. But that's the nature of Twitter, I guess.


Corey: All of my tweets are ghostwritten for me. Well, not all of them. Which ones specifically? The ones that you don't like. That's right.


Thomas: [laughs].


Corey: That's called plausible deniability.


Thomas: [laughs]. So, yeah, and if you care about questions, like, “I’ve got 50,000 machines, and I would like to know which lines of code are eating how many of my cores?” Then it's probably a good idea to head over to optimyze.cloud. Remember, optimyze, M-Y-Z-E at the end to make spelling more fun. And sign up for our newsletter.


Corey: You're disrupting the spelling of common words.


Thomas: I'm sorry for that, but the regular domains were too expensive, and trademarks are really hard to get.


Corey: They really are. Well, thank you so much for taking the time to speak with me today. I really do appreciate it.


Thomas: Thank you very much for having me.


Corey: Thomas Dullien, CEO of optimyze.cloud. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts, whereas if you hated this podcast, please leave a five-star review on Apple podcasts anyway, and tell me why this should be deprecated as a show, along with what division of Google you work in.


Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at ScreamingintheCloud.com, or wherever fine snark is sold.


This has been a HumblePod production. Stay humble.


Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.