Screaming in the Cloud
Audio Icon
Episode 27: What it Took for Google to Make Changes: Outages and Mean Tweets

Google Cloud Platform (GCP) turned off a customer that it thought was doing something out of bounds. This led to an Internet outrage, and GCP tried to explain itself and prevent the problem in the future.

Today, we’re talking to Daniel Compton, an independent software consultant who focuses on Clojure and large-scale systems. He’s currently building Deps, a private Maven repository service. As a third-party observer, we pick Daniel’s brain about the GCP issue, especially because he wrote a post called, Google Cloud Platform - The Good, Bad, and Ugly (It’s Mostly Good).

Some of the highlights of the show include:

  • Recommendations: Use enterprise billing - costs thousands of dollars; add phone number and extra credit card to Google account; get support contract
  • Google describing what happened and how it plans to prevent it in the future seemed reasonable; but why did it take this for Google to make changes?
  • GCP has inherited cultural issues that don’t work in the enterprise market; GCP is painfully learning that they need to change some things
  • Google tends to focus on writing services aimed purely at developers; it struggles to put itself in the shoes of corporate-enterprise IT shops
  • GCP has a few key design decisions that set it apart from AWS; focuses on global resources rather than regional resources
  • When picking a provider, is there a clear winner? AWS or GCP? Consider company’s values, internal capabilities, resources needed, and workload
  • GCP’s tendency to end service on something people are still using vs. AWS never ending a service tends to push people in one direction
  • GCP has built a smaller set of services that are easy to get started with, while AWS has an overwhelming number of services
  • Different Philosophies: Not every developer writes software as if they work at Google; AWS meets customers where they are, fixes issues, and drops prices
  • GCP understands where it needs to catch up and continues to iterate and release features


View Full TranscriptHide Full Transcript