The Blog

Multi-Cloud is the Worst Practice

Calendar Icon 08.05.2020
aws-section-divider aws-section-divider

Multi-cloud (that is, running the same workload across multiple cloud providers in a completely agnostic way) is absolutely something you need to be focusing on—at least, according to two constituencies:

  1. Declining vendors that realize that if you don’t go multi-cloud, they’ll have nothing left to sell you. AWS isn’t going to build a multi-cloud dashboard, so VMware (motto: “The Payday Lenders of Technical Debt”) will absolutely build and sell you one.
  2. “Niche players” (that’s Gartner-speak for “crappy”) in the public cloud space who realize that if you go all-in on just one cloud provider, it will absolutely not be them.

Speaking as neither of those two constituencies, I’m much more interested in what’s right for your company.

The single disclaimer I’ll make in this article is that I’m talking here about best practices, which you should hear as “sensible defaults.” If you have a considered reason why something I say doesn’t apply to your situation, you’re almost certainly correct. “A customer demands it” is one such reason and “people will actually die if this service goes down” is another. In other words, I’m calling multi-cloud a “worst practice” to be avoided by default.

What multi-cloud is not

Every company is a multi-cloud company if you squint hard enough. Here at the Duckbill Group, we use:

  • G-Suite for collaboration
  • AWS for infrastructure
  • GitHub for our code repositories, and
  • IBM Model M buckling spring keyboards to express passive aggression towards our family during these unprecedented times.

This is in no way what I’m talking about. That’s just good business sense. If someone suggests you go all-in on AWS and implies that this means using Amazon Chime, WorkDocs, and CodeCommit, that person is actively attempting to sabotage you and you should stop reading this and call corporate security immediately.

What is multi-cloud?

What I’m referring to instead is the idea of building workloads that can seamlessly run across any cloud provider or your own data centers with equal ease. (Note that data-centers and a public cloud provider paired together is known as “hybrid,” which is an essay for another time.) I agree with the vision; it’s compelling and something I would very much enjoy.

However, it’s about as practical as saying “just write bug-free code” to your developers—or actually trying to find the spherical cow your physics models dictate should exist. It’s a lot harder than it looks.

Yes, every cloud provider can run containers that you hurl their way. This is the promise that Kubernetes (an open source project out of Google that’s named after the Greek god of spending money on cloud services) has brought to life.

Lowest common denominator

The trouble is that a cloud provider is only “a pile of disks, some network, and a bunch of servers to run containers on” by the very broadest definition of “cloud;” i.e. what IBM seems to think it means. Those basic primitives exist everywhere: AWS, Azure, GCP, Oracle Cloud, IBM “Cloud,” and your terrifying data center.

If you treat all of those environments as being the same thing, that means that every additional service that’s any higher-order than those baseline primitive offerings is closed to you.

Load balancers work differently on every cloud platform, so being multi-cloud means you’re running your own with nginx or HAproxy. The same story applies to databases, monitoring systems, security permissions models, anything that’s event-driven, a service mesh, an object store, and oh my god you haven’t even thought about compliance yet, have you.

Yes, I know what you’re about to say: the industry as a collective whole has been doing this for a long time; we haven’t magically forgotten how to run all of these things ourselves.

My point is that while you’re spending time configuring HAproxy to route requests to the proper containers when the right conditions are met, one of your competitors has configured an Application Load Balancer to do this with three lines of YAML and is now moving forward on building the thing that actually matters to their business goals. We’ll ignore entirely the fact that the managed version of the load balancer has way better availability, durability, reliability, and resiliency than the thing you’ll cobble together.

You’re not “leveraging the best of both worlds.” You’re improving your data center at the expense of your cloud environment.

What about lock-in?!

Another common rationale for multi-cloud is to avoid lock-in to a single vendor.

I have some bad news for you: You’re already locked in.

You’re locked-in either to technology selections (databases are a killer here), to “soft” lock-in via things that don’t port super well (whatever your cloud provider’s Identity and Access Management story is, it’s almost certainly not congruent with that of other cloud providers), or to what I’ll call “buy-in.”

Buy-in

Of those, buy-in is the only real killer and, coincidentally, the one that no one outside of Engineering thinks about.

In other words, what happens when you announce a global migration from AWS to Oracle Cloud? Well, to start, at least a third of your engineering staff will quit to go work down the road with their existing skillsets on a platform that other companies are using more deeply.

It’s hard to learn a new provider’s ins, outs, and (most importantly) how it’s going to fail. It’s far easier to get ever deeper into the existing ecosystem you’re already familiar with. It’s also way more valuable from the employee’s perspective. Companies don’t want to hire generalists who are broad across multiple providers; they bias for specialists who are good on one particular platform. This may not be intuitively true if you look at job postings; it becomes much more understandable if you filter for the jobs that pay eye-wateringly high salaries.

As a direct result of this, virtually all of the growth that the cloud providers demonstrate in their quarterly earnings isn’t coming from convincing other cloud providers’ customers to switch; it’s a combination of people migrating in from data centers as well as net new workloads. It’s hard enough to migrate from a data center to a cloud provider when you (at least in theory) already know all of the knobs and dials for your data-center environment. Going from cloud to cloud is at least ten times more complicated–and neither companies nor employees are particularly gung ho about signing up for that particular brand of pain.

Negotiating leverage

“Ah!” you may wisely interject. “If I have two cloud providers, I can use one to beat the other into offering better discount terms!”

Swing and a miss in most circumstances, I’m afraid. This isn’t vendor management from the bad old days of Big Telco.

Every cloud provider of substance (and also Google Cloud, zing) negotiate discounting percentages based upon percentage of spend. Cutting your spend in half reduces your negotiating base.

But let’s pretend for a second that you’re a company with incredibly portable workloads (which do exist!) that can in fact transition workloads to other providers seamlessly.

Every time we’ve seen this happen with our clients, the discounting achieved from that threat is less than the discount that the customer would get simply by committing to higher spend levels. (Remember, in helping fix the horrifying AWS bill, we see and help negotiate an awful lot of large-scale cloud contracts!)

Let me be very clear: $9 million a year on one provider vs. $3 million a year each on three providers yields remarkably different costs even before you factor in the expensive management and engineering overhead of making those systems play well together, and now you’re negotiating discounts on the basis of $3 million a year…and doing it three times. Talk about cutting off your nose to spite your face.

Further, regardless of what provider you pick, once a cloud vendor runs your production infrastructure, they cease being your vendor and instead become your partner—whether you want them to be or not. Adversarial relationships aren’t nearly as productive as collaborative ones. I’m not suggesting you leave money on the table, but be smart about what you’re asking for.

Multi-cloud doesn’t protect you from price changes

In practice, the unspoken rule of the cloud is that things get less expensive over time. We’ve seen that borne out by every price change from every provider with the singular exception of Google Cloud, which completely shafted its customers—twice.

The first time was in their 14x increase for Google Maps API access. The second was when they began charging for previously-free GKE cluster management. I expect that as soon as I publish this, they’ll do it a third time so I’m once again out of date.

We can thus safely declare GCP a special case from which you really want to have a rapid exodus plan should you need it.

Multi-cloud doesn’t exist in reality

In practice, every “we’re multi-cloud” story I’ve ever seen in the wild means “we’re over 80% on our primary provider, then have a smattering of workloads on others.”

Any choice you make constrains your options. Multi-cloud is the embodiment of the ideal “indecision is the key to flexibility.” The trouble is that you’re going to spend so much time avoiding making a commitment to one provider that you’ll spend an ever-increasing amount of engineering toil keeping your environment functional at a relatively basic level that you’re going to struggle to really innovate as a business.

As Ben Kehoe so eloquently states, multi-cloud is like cow-tipping: We know it doesn’t exist because there are no videos of cow-tipping on YouTube. In this case, there are no articles or conference talks of companies talking about their successful multi-cloud strategies paying off.

To really drive that point home, consider this: VMware’s entire business is predicated on this bet paying off, and yet VMworld 2019’s keynote featured a fictional company called Tanzu Tees making hilariously awful technology choices to use a whole bunch of different cloud providers interchangeably rather than an actual customer using these things because real companies just don’t make IT decisions this poorly. (It may be perhaps more accurate to say that they don’t make decisions this awful and then admit to them on stage.)

If vendors that are themselves highly incentivized to demonstrate multi-cloud success stories can’t find anyone to get on stage and talk about it, what does that tell you about the model’s viability?

A modest suggestion

If you’re anything like me, you’re going to read this, believe you’re a special case to which none of the above constraints or caveats apply, and try for multi-cloud anyway.

You do you; you know your business far better than I do.

I just have one suggestion: Before you go whole hog into a second cloud provider, first spin up an active-active environment across two regions in your current provider. With complete service and API compatibility between those regions, it should, by your theory, be a piece of cake.

Come back and talk to me after you’ve done that, and we’ll see how “simple and straightforward” this really is.

Multi-cloud is the wrong answer. It’s going to take more than an imaginary company’s “success story” to convince me otherwise.

aws-section-divider