researchHQ’s Key Takeaways:
- Traditional block storage is made up of physical sections of the disk media called sectors, these ‘blocks’ can be written or re-written as needed
- In contrast, object storage packages an unlimited amount of metadata along with all the bits of a file into one item and assigns this item or ‘object’ a unique identifier.
- Object storage allows companies to have a platform for large scale file architecture that is no longer bound by physical location.
- Despite its various benefits, object storage is unsuitable for use in databases.
- Object storage should be used in cases when eventual consistency is not a problem and where entire files are changed as a unit.
When trying to find places to store stuff in either a private or public cloud, one of the most frequent questions is: What is Object Storage, and why do I need it?
The answer is simpler than you might think, but the application of Object Storage can be more complicated that it looks.
First, what is Object Storage and how is it different from what we’ve used up until now – namely block storage?
What is Block Storage?
Block storage – the kind of storage in your computer’s hard drive or SSD, for example – is made up of physical sections of the disk media called sectors. These sectors or “blocks” are written and re-written as needed.
Sectors are small, random, and not human-friendly. Consider a small 50KB document – it resides on 100 sectors and no human would want remember the list of 100 sectors to reconstruct that file! That’s why typically software such as a file system (e.g., NTFS or ext4) or database will store and maintain metadata that maps human-friendly constructs like files, directories, or database records to sectors on a disk. This works for most data-sets you’d find in a single desktop, or attached to a single database, or single application.
Shared meta-data for the data on physical disks is kept, but the blocks themselves are just chunks of data. So an application asks the file system where to find all the chunks it needs. The app then goes to the appropriate sectors and grabs the chunks, and presents the file. When you write data, applications that are block-aware, like databases, can overwrite just those blocks that are changing, and can make sure that it can lock that area of the storage to ensure that no one grabs and older version as it is writing out the newer version.
This model works great for most general-purpose workloads, but it has drawbacks that we’re beginning to feel – especially with web-scale applications and massive data repositories. First, the metadata being stored independent of that actual data leads to massive metadata “warehouses” that can add an equally massive amount of latency when attempting to locate any given set of blocks. Secondly, the amount of metadata is limited by necessity – too much and the “warehouse” spirals totally out of control in terms of size. Very large datasets can create bottlenecks that slow everything down when the app needs to speed up.
Additionally, since the data and metadata must reside on the same storage platform (disk, SAN, etc.); stretching storage to multiple sites becomes a nightmare for block-storage platforms. There are definitely ways to accomplish the task, but they can be tricky to configure, can create even more app latency, and still have trouble tracking multiple copies of the same data at multiple locations.
Finally, while block storage is great for data-sets that change parts of themselves frequently – such as databases – it is less efficient with file types that change infrequently but when they change, the whole file changes every time – like photos and other media. Since the files are seen not as single units by the file system, but rather as a collection of blocks, updating every block every time can create even more latency issues.