As a general rule, when people think about object storage they think about one thing — the price per TB/GB. Though a legitimate cost metric, it has the effect of making object storage one-dimensional and relegates it to archival use cases. Further, it distorts the value associated with this increasingly important part of the enterprise tech stack.
Frankly, the legacy object storage players are to blame. For years they have under-innovated on the technology front in favor of making ever-cheaper appliances. While these old school vendors might argue that is what customers wanted, they would be wrong.
The evidence can be found in the $25 billion in revenue that Amazon Web Services racked up last year — the vast majority of it in high performance, primary object storage. If we conservatively attribute $20 billion to S3 storage service, we can also safely say that S3 is likely as big as the rest of the appliance market combined. Throw in the similarly priced and rapidly growing Azure Blob and Google Cloud revenue and the case becomes clear — cost is but one consideration.
That is why modern enterprises focus on a broader set of metrics — metrics that emphasize performance, operational efficiency, flexibility and price — not just price alone. They recognize that putting your data on ice reduces its value to the organization. At a time when the goal is to maximize the value of the organization’s data, the appliance vendor’s approach seems antithetical.
What should enterprises be considering? Well, they fall into five broad categories:
- S3 compatibility
- Failure response
These five elements, in addition to cost, are what define the new metrics in object storage. They are the super six. Let us look at them in turn.
Object storage has not traditionally been known for performance. In the race to the bottom on price, appliance vendors continually sacrificed performance. To wit, they use terms like “glacial” to define their product offerings.
Modern object storage changes that.
From Amazon to MinIO, we are seeing speeds that approach Hadoop and even surpass it. The new metrics for object storage relate to read and write speeds of 10s of GB/s for HDD to 35+GB/s for NVMe. This throughput is plenty fast for Spark, Presto, Tensorflow, Teradata, Vertica, Splunk and the rest of the modern computational frameworks in the analytics stack. The fact that MPP databases are targeting object storage is evidence of that object storage is increasingly the primary storage
If your object storage system can’t deliver these speeds then you can’t interact with all of your data and can’t extract the appropriate value from it. Even if you pull the data out of your traditional object store to an in-memory processing framework you still need the throughput to shuttle data in and out of that memory — you simply cannot get that throughput from legacy object appliances.
This is a key point. The new performance metric is throughput, not latency. This is what is required for data at scale — something that is the norm in the modern data infrastructure.
It should be noted that while performance benchmarks are a nice proxy, one does not truly know what performance looks like until they have run the specific application in that environment. Only then can they understand if the bottleneck is the storage software, the drives, the network or the compute layer.
Scalability is usually referred to as the number of petabytes that fit into a single namespace. Every vendor claims zeta scale but it hides the fact that massive, monolithic systems become brittle, complex, unstable and expensive as you scale.
The new metric for scalability is how many different namespaces or tenants you can handle.
This metric is taken directly from the hyper-scalers — where the building blocks are small but scale to the billions. It is, in short, the cloud native way.
When the building blocks are small everything can be understood and optimized more effectively — security, access control, policy management, lifecycle management, non-disruptive upgrades and updates and ultimately performance. The size of the building block is a function of the manageability of the failure domain. This is how highly resilient systems are architected.
Multitenancy has multiple dimensions in the modern enterprise. While it certainly refers to how enterprises organize access to data and applications it also refers to the applications themselves and how they are logically isolated from each other.
A modern approach to multitenancy has the following characteristics:
- Tenants can grow from a few hundred to a few million in a short span of time.
- Tenants are fully isolated from each other enabling them to run different versions of the same object storage software with different configurations, permissions, features, security and service levels. This is an operational fact of life when new servers, updates and geographies are scaled.
- Elastic and on-demand.
- Every operation is API driven and automated without a human in the loop looking at a dashboard.
- Where the software is light enough to be containerized and leverages industry-standard orchestration services like Kubernetes.