The CDK’s Most Fundamental Flaw is Fixable

One of my most popular tweets ever is, unsurprisingly, about CloudFormation.

#awswishlist CloudFormation should accept a blurry photo of an architecture diagram I drew in crayon on a McDonalds placemat in addition to templates in YAML and JSON
— Ben Kehoe (@ben11kehoe) February 13, 2021

Naturally, this is a metaphor for the deprecation of cdk synth. Wait, what?

I have a long history of advocating against the CDK’s approach, and I gave a whole talk at Serverlessconf on why. But people often mistake my primary objection to the AWS Cloud Development Kit.

My views boil down to:

I’m mildly against imperative resource graph definitions.
I’m moderately against allowing nondeterministic resource graph definitions.
I’m strongly against developer intent remaining client-side.

This article is about point 3: the CDK’s approach of client-side generation of CloudFormation templates is deeply flawed, but eminently fixable.

What are we talking about? That depends on how things are defined

First, I want to discuss terminology. When we talk about infrastructure as code (IaC) and programming languages, it doesn’t matter if we use CloudFormation, the CDK, Terraform, etc., IaC means defining a graph of cloud resources (in CloudFormation YAML, Python, JavaScript, HCL, etc.). We then pass that resource graph definition to a deployment engine that creates the resource graph as deployed resources. I think a lot of people get tied up with resource graph definition formats and also confuse the resource graph definition format with the engine, so I’m going to explicitly separate the two here.

We need to stop thinking that a set of JavaScript (or any other language) files using the CDK is a method for outputting a resource graph definition; it is a resource graph definition. A tool that demonstrates this is InGraph, which allows for resource graph definition in Python without a lossy generative step. Read more on this at my blog.

People close to the CDK often talk about CloudFormation templates as “assembly language” and the CDK as a “compiler,” but I think that view misses the point in two respects.

First, the actual “assembly language” for a resource graph definition language is the imperative deployment plan created by the deployment engine. This is the internal plan that the CloudFormation service generates for a ChangeSet. That is, the CloudFormation service is a compiler for resource graph definitions, with the compiled output being an imperative plan for making reality match the resource graph. Currently, the only resource graph definition language the CloudFormation deployment engine accepts is the CloudFormation language, in YAML or JSON format. cdk synth, then, is not a compiler; it is a transpiler: it transforms, in a lossy manner, a resource graph definition from one language to another.

Second, framing CloudFormation templates as “assembly language” is used to excuse the lack of facility for abstractions within CloudFormation templates. We shouldn’t be doing this! Even if it was the ideal that the CloudFormation template language was the common resource graph definition format that other resource graph definition formats transpiled to — and the only format accepted by the CloudFormation service — it should be capable of fully representing the original context, and not just through opaque, source-specific metadata. This would mean improving and expanding the CloudFormation language to represent more complex aspects of architecture (an example of this would be a Fn::Map intrinsic function).

Once we understand that CDK programs are resource graph definitions in themselves, we can see that we should be submitting CDK programs to the deployment engine. The CloudFormation service would then accept multiple resource graph definition languages for its deployment engine, not just the CloudFormation resource graph definition language in YAML or JSON format.

We shouldn’t need multiple languages

Coming back to the original tweet, and being strongly against developer intent remaining client-side: When I scrawl that architecture diagram on a McDonald’s place mat, I’m expressing an intent. Every step of translating that into a deployed cloud architecture that takes place outside the cloud is almost certainly lossy, and the lost information will not reach the cloud. I want the cloud service that is in charge of making my desired architecture a reality to have as much context about what I want as possible. The different crayon colors I have chosen indicate how I’m mentally partitioning my application. The angry scribbles around a particular Lambda function indicate it needs more operational attention.

The goal of bringing more of the original context to the cloud, by bringing the original form of the resource graph definition, leads to the follow-up to the original tweet:

Running drift detection on the resulting stack produces another blurry JPG with the changes from your original circled in red.
— Ben Kehoe (@ben11kehoe) February 13, 2021

The point here is that users should not have to change languages as they interact with their definitions and the state of the deployed system. Every resource graph definition format should come with the tools for this. These tools include understanding the contents of any abstractions in the definition, understanding the difference between desired resource graph state and existing state, finding all the deployed instances of a given resource type (where a resource type can be an abstraction), etc. When I use the AWS console, I should be able to explore the current state of my architecture using that original format.

If my medium is crayon on McDonald’s place mats, the cloud should meet me there. I should not have to transform my grease-stained scribble into an intermediate format for the resource graph deployment engine to understand it.

Now, let’s turn to the CDK. If I have written a JavaScript program expressing my desired resource graph, using high-level abstractions, that program should be what I bring to the cloud. I should not have to transform that program before touching a cloud API. When I want a view of the resolved resources of my program, I should get another JavaScript program that produces an identical graph with abstractions removed. When I want to do drift detection, I should again get a JavaScript program that not only would produce the extant state in the cloud, but has a meaningful file-level diff with my original source program, just like you would look at a change you’re considering making to your infrastructure. At no point in the process should I have to generate a resource graph definition in a different format.

AWS is pushing ownership onto customers with the CDK

So why can’t you upload a CDK program wholesale into the cloud as your resource graph definition format? Here I will be blunt: The CDK team has chosen a path of pushing ownership onto customers to avoid being burdened with the responsibilities of owning a managed service. This allows them to move faster, but at what cost? Lambda functions are deployed to customer accounts with little notification or explanation in order to have custom resource providers instead of implementing proper AWS-owned CloudFormation resource providers. It promises backward compatibility but lacks the visibility a managed service has in whether it has been achieved or not, leaving that discovery to the customer. And to do this, they have accepted, explicitly or implicitly, that developer intent will not reach the cloud intact.

But this is not an inherent flaw! The resource graph definition language that the CDK defines is, like CloudFormation, separable from the engine (in this case, the CDK library and CLI tool) that operates on those definitions.

We can imagine a potential future that’s better than the path we are on. The AWS Developer Tools group, which includes both the CDK and CloudFormation, invests in representing abstractions found useful in CDK programs directly in CloudFormation, whether through new CloudFormation resource providers, CloudFormation modules, or changes to the CloudFormation language itself. CloudFormation builds the notion of “applications” — related collections of heterogeneous stacks, potentially cross-region — into the service. The new CreateApplication API takes Cloud Assembly files as input, but, as it is today, these ZIP files can only contain templates in the CloudFormation language (in YAML/JSON format).

As this work progresses, the amount of work cdk synth has to do decreases, eventually becoming lossless. At this point, Cloud Assembly files can contain CDK programs — that is, without generating CloudFormation templates — and CloudFormation accepts them as an input to CreateApplication. The Cloud Control APIs allow you to inspect what constructs you have deployed.

In the final stage, when you inspect an Application resource in the AWS console, you can visualize the (multilevel) resource graph associated with it. Asking about a resource (including abstractions) it can show you in the console the code in your CDK program where this resource was defined. Drift detection shows you, again in the console, a file diff with your original CDK program. This modified CDK program that represents the actual state of the resource graph is downloadable.

With the above utopia achieved, there’s no need for developers who choose to define their resource graphs in a familiar, imperative, general-purpose language to learn the declarative syntax of the native CloudFormation language because their entire development, deployment, and operations cycle never involves it. Developers who choose to use the declarative CloudFormation language don’t miss out on building abstractions in that language. Imperative-vs.-declarative arguments are relegated to debates about personal preference on slow Friday afternoons. Well, given that people still get heated about vi and Emacs, that last one is probably too much to hope for.

The CDK has been bringing a much-needed capability for abstractions to AWS’s IaC offerings. Arguments about imperative versus declarative IaC are minor when compared with the subject of preserving developer intent across the entire life cycle of a cloud application. I believe that the CDK’s client-side approach is the wrong one, but I also believe the client-side approach is not an integral part of the CDK. If you use the CDK, you should be able to think in CDK everywhere you touch the cloud.

The CDK’s Most Fundamental Flaw is Fixable

What are we talking about? That depends on how things are defined

We shouldn’t need multiple languages

AWS is pushing ownership onto customers with the CDK

More Posts from Ben

Get the newsletter!