Data processing pipelines have ever-growing requirements for speed and throughput. It’s no longer enough to store data and save it to batch processing at some future time. We need to be able to process data in real time to make snap decisions and get immediate insights.
Stream processors are services that can pull in data being continuously generated from thousands of producers, then send it to consumers, who can act upon the data. The whole process happens in milliseconds. They’re used for a variety of applications, like real-time fraud detection, tracking IoT device behavior, large-scale logging and monitoring, and even the AWS bill.
You have plenty of options for this sort of stream processing within AWS. Tools like Amazon SQS or RabbitMQ cover simple use cases. If you want more something more powerful, there are three popular options:
- Use Amazon Kinesis Data Streams.
- Run Apache Kafka on your own EC2 instances.
- Use Amazon Managed Streaming for Apache Kafka, or Amazon MSK, to let Amazon run Kafka for you.
All three are powerful, capable services. Which should you pick? This sort of service sits right at the heart of your architecture, so once you commit, you’re unlikely to change in a hurry. Operations, performance, and cost are among the factors you might consider.
Installation, operations, and administration
Running a production-ready Kafka cluster is complicated. You need to provision servers, install and set up Kafka and ZooKeeper, secure the cluster, configure failover and replication for high availability, stay up-to-date with software patches, sort out monitoring and alarms, organize on-call support for when the cluster fails, and plenty more besides. All that takes time and money, and it presents a significant opportunity cost.
By contrast, Kinesis is fully managed — Amazon takes care of all this operational complexity. You can get a production-ready Kinesis system in a few minutes, compared with days or weeks for Kafka. If you’re considering Kafka, you should ask if the delay is worth the difference.
Amazon MSK is somewhere in the middle to get up and running. Amazon handles some of the difficulty of running Kafka, but it’s still more complicated than Kinesis.
Flexibility and configurability
Kafka has a huge number of knobs and dials that allow you to precisely tune its behaviour. This is what you get for the operational complexity: more options and flexibility. There are things you can do in Kafka that you can’t do in Kinesis; if you rely on one of those features, the decision is made for you.
A few examples:
- Message retention: In Kinesis, you can only retain messages for a year. (It used to be seven days, but Amazon increased the message limit in November 2020.) In Kafka, you can retain messages for as long as you like.
- Messaging semantics: Kinesis always uses “at least once” message delivery, whereas Kafka supports both “at least once” and “exactly once” message delivery.
- Message size: A single message in Kinesis can be up to 1MB. In Kafka, the max size is configurable. I usually recommend sending more small messages rather than one big message, but you can send big messages if you need them.
Kinesis doesn’t have many configuration options — it’s designed for the 80% use case. Kafka can handle the more esoteric and unusual use cases, if that’s what you need.
Both services are designed for high-performance, low-latency applications. They can scale to process thousands of messages with sub-second latency.
If you look around, you can find blog posts, benchmarks, and news stories that show Kafka edging out Kinesis. This isn’t surprising — Kafka’s configurability means you can fine-tune its behavior. If you can invest the time to study your team’s approach and your real workloads, you might be able to optimize Kafka for your particular usage.
That said, it’s worth asking if you want “absolute best performance” or just “good enough.” For many use cases, both processors are plenty good enough.
Kafka and Kinesis have different scaling models.
In a Kinesis stream, the unit of scaling is the “shard.” Each shard provides a write capacity of 1MB, or 1,000 records per second, and a read capacity of 2MB, or 5 transactions per second. You add shards until you reach your desired capacity.
In Kafka, there are two units of scaling: the “broker” and the “partition.” The broker is the underlying server in your Kafka cluster. Choosing the right instance type and number of brokers is more complicated than counting Kinesis shards. Amazon provides a right-sizing guide, but that’s a starting point rather than a complete solution. Finding the right cluster size is an iterative process. The partition is analogous to the Kinesis shard: More partitions give you more simultaneous read/write capacity.
You can manually change your scaling configuration after your initial deployment — for example, to add more shards or more brokers — but this tends to be more fiddly with Kafka than Kinesis. Changing your Kinesis shards is a single API call, but changing your Kafka brokers is a more involved process.
Note that in Amazon MSK, broker scaling is a one-way door: You can add brokers, but you can’t remove them.
Kinesis also allows autoscaling of shards to match your usage, which may help you handle large spikes and reduce your overall bill. Autoscaling Kafka is a more involved process.
Kinesis and Kafka have difficult billing models that make them tricky to compare.
With Kinesis, you pay for the “shard hours” and “PUT payload units” — two units that represent the throughput and data transferred within a stream. If you want higher throughput or you send more data, you’ll pay more.
You also pay for data transfer, which adds another dimension of unpredictability. Transfer out of AWS is the same for all three services, but the cost of replication will vary. In a production service, you’ll replicate data across multiple AZs for redundancy. You pay extra for that if you run Kafka on EC2, but for Kinesis and MSK it’s built into the price.
With Kafka, you pay a per-hour bill for the brokers — the underlying compute instances and storage. You pay the same amount whether you send a lot of messages or just a handful.
As with many AWS services, you’re swapping a (potentially) cheaper bill for a consistent bill. The pay-as-you-go model of Kinesis could save you money if your stream has idle periods, but it’s hard to know how much it’ll cost in advance. Neither service is definitively cheaper than the other.
But the AWS bill isn’t the only cost. The staff who run, maintain, and support your Kafka cluster aren’t working for free. For all but the largest users, I’d expect the staff cost of Kafka to overshadow a Kinesis bill.
Integration with other AWS services
You’re not running a stream processor on its own. You want to get data from somewhere and send it somewhere else — and if you’re reading this blog, that probably includes other AWS services.
Because Kinesis is Amazon’s principal streaming service, it tends to be better integrated with AWS than Kafka. That’s not to say you can’t use Kafka with AWS services, but it might be more fiddly or complicated, all of which adds to the cost of running Kafka.
For example, suppose you want to get a stream of updates to a DynamoDB table. For Kinesis, that’s a single click in the AWS console. For Kafka, you need to find and install a custom connector (which often ends up wrapping a Kinesis stream anyway).
Look at what AWS services you want to integrate with, and compare the Kinesis and Kafka integrations. Which will be easier to install and maintain?
Kinesis vs. Kafka: Which service I’d choose
If I were starting a brand new project today, I’d go straight to Kinesis.
If I had existing Kafka clusters or experience, I’d look at Amazon MSK.
I’d only run Kafka on my own EC2 instances if there were a very specific feature or option I needed that wasn’t available in MSK. (And if that’s your use case, you don’t need me to tell you what to do.)
All three options can be the right approach under the right circumstances, and they’re all popular. Pick one, commit to it, and watch the messages start to fly.