Block storage is a low-level technology that underpins the majority of modern storage — including whatever device you’re reading this on. It divides data into small units called “blocks,” which provide fast and flexible ways to manage data. It’s ideal for local storage and high-performance workloads.
This is the first article in a three-part series about the common types of storage in the cloud: block, file, and object. They all work differently, and they all have unique characteristics that make them suitable for different types of workload. We’ll look at how each of these storage technologies work, what they’re good for, and how to use them.
How does block storage work?
A block is a tiny chunk of data. A block storage volume is a collection of blocks, each with a unique identifier, and each block can be read and written individually. Block storage is any storage that keeps data in block storage volumes.
If you want to store a large file, you have to split it up and write it to multiple blocks. You read and combine the same blocks to retrieve the file later. You have to remember which blocks you’re using because the storage volume won’t. A single block doesn’t know it’s part of a file or which other blocks it should be combined with.
Let’s look at an example. Suppose I have a block storage volume with 4-character blocks, and I want to save the sentence “I am called Alex.” That won’t fit in a single block, so I have to split it into multiple parts. I might save those parts like so:
Blocks for a single file don’t have to appear next to each other or even in the correct order. We have to know that blocks #1, #3, #4, and #6, highlighted in yellow, are part of the same sentence. Otherwise, this set of blocks doesn’t make sense — there’s nothing in the storage volume to tell us that our four blocks go together.
Most of us don’t care about how our files are split into individual blocks, so we install a file system to manage that for us. We write code that deals with files and folders, and the file system translates that into block-level actions. It remembers which blocks are part of which files so we don’t have to.
Where does the file system keep this information? It has to go somewhere … so it goes in other blocks! The file system keeps a lookup table for all your files, which it saves in blocks that aren’t being used to store the content of files.
Most devices use block storage under the hood, but file systems and other abstractions often hide that detail.
What’s it good for?
Block storage gives you very granular control over how your data is stored. By choosing blocks carefully, you can fine-tune and optimize your storage to be as fast as possible. This makes it popular for any workloads that are particularly performance sensitive.
For example, suppose I write the sentence “The shape has blue sides” to a block storage volume. I have to write six blocks:
Later, I repaint the shape in a different color. If I wanted to update the sentence I’ve saved, I could write all six blocks again:
But I could also update just the blocks that have changed:
By only writing two blocks instead of six, this optimized write will finish three times faster. If I were only writing two blocks in a much larger file, this would be an even bigger saving.
This is one of many tricks you can use to get incredibly high performance from a block storage volume.
What are some common use cases for block storage?
The performance potential means block storage is very popular for database and transactional workloads. Being able to make small, frequent changes without rewriting an entire file is exactly what they need. For example, Amazon RDS uses block storage for its persistent storage layer.
Block storage is also used as the primary storage for virtual machines. If you’ve ever started an EC2 instance, you’ve used block storage.
When shouldn’t I use block storage?
If there’s any distance between you and the storage volume, you quickly lose the benefits of block storage — latency overwhelms any performance gains. This is why block storage tends to be locally attached. It’s unusual to see block storage used directly over a network; usually network storage presents a higher-level abstraction. If you want to share data, consider file storage.
Block storage can get expensive for large amounts of data. Additionally, it can only be so big; typically block storage volumes are tens of terabytes, but no bigger. If you want to store a large amount of data without blowing the bank, consider object storage.
How do I get block storage?
Most providers offer block storage as part of their VM offering — services like Amazon Elastic Block Store (EBS), Azure managed disks, GCP persistent disks, and Linode block storage. You’d normally get it when you run a VM, rather than using it as a standalone service.
There are two dimensions for buying block storage: volume size and throughput.
Volume size is how much storage you need in gigabytes and terabytes. Most providers allow you to resize a volume after it’s been created, but check the details. Resizing a disk often involves downtime, and may not always be possible.
Throughput is how much bandwidth you need based on how quickly you want to read and write blocks. In most cases, the default throughput should be fine. You’re unlikely to hit the limits except for very unusual workloads, like high-performance databases. There are usually other bottlenecks before you hit the throughput limits of block storage.
The price varies depending on your volume size and throughput, but typically, I’d budget about $100/TB of block storage.
Block storage: Small bits of data, big data control
Block storage is low-level plumbing that gives you fine-grain control over your storage. It defers higher level tasks to a file system, which lets you work with files and folders instead of individual blocks.
It’s a great choice for local storage or workloads that need the highest performance. It’s not so good if you want to share storage among multiple users or use it over a network.