You are currently viewing Stop Buying the Wrong Storage

Stop Buying the Wrong Storage

How to Choose the Right Storage Platform for Each Workload

Every few years the storage industry reinvents the same lie.

One platform that does everything.

Vendors love it because it simplifies sales. Architects love it because it simplifies diagrams. Technicians hate it because physics still exists.

Latency, throughput, metadata handling, failure domains and recovery behavior are not marketing features. They decide whether a platform is fast, stable or painful to operate. When storage is chosen wrong, no amount of tuning will fix it.

This article compares some examples of real storage architectures from enterprise arrays to Ceph, Lustre, ZFS, DRBD and cloud services like Oracle OCI and maps them to real workloads.

Warning: do not take this list at face value. There are always exceptions driven by cost or process constraints. What really matters is understanding the best practice for each storage tier and then adapting it to your own needs with good judgment, to get the most value and avoid problems. It is normal to adapt a customer’s requirements with solutions that are not perfectly aligned with the theoretical optimal one, but what matters is that, by knowing the workload and the limits of the chosen protocol, service or appliance, it can still deliver a stable and performant service.

In this post I have listed some of the most popular storage types currently on the market, but that does not mean others that are just as performant and widely used do not exist. Use this post as a general logical reference.


The storage families

FamilyWhat it really isTypical products
Enterprise SANDedicated block arraysNetApp, Pure, Dell PowerStore, 3par
NASCentralized file serversNetApp NAS, Isilon, Qumulo, OCI File Storage Service
Software defined blockDistributed block storageCeph RBD, vSAN
Software defined fileDistributed filesystemsCephFS, GlusterFS
HPC file systemsHigh throughput parallel FSLustre, BeeGFS
ZFS based appliancesScale up reliable storageZFS file system, ZFS OCI appliance
Replicated blockActive passive disk mirroringDRBD
Object storageHTTP based storageOCI Object, S3, MinIO
Cloud block storageVirtual SANOCI Block Volume

These are not interchangeable. They are built for different use cases.


What real workloads need

WorkloadWhat matters
VMware and KVMLatency, snapshots, fast recovery
KubernetesDynamic volumes, stable performance
Databasesfsync, write latency, consistency
AI and HPCMassive parallel throughput
Backup and archiveCheap capacity, durability
File sharingMetadata speed, locking
Disaster recoveryDeterministic replication

How these technologies really map

WorkloadGood choicesBad choices
Virtual machinesSAN, OCI Block volumes, Ceph RBDCephFS, Gluster
KubernetesOCI Block Volumes, Ceph RBD / Rook , OCI FSS HPMTNAS, CephFS
DatabasesSAN, ZFS, OCI Block Volumes, NVMeGluster, CephFS
AI and HPCLustre, BeeGFS, NVMeCeph, ZFS
Shared filesNAS, ZFS appliance, OCI File Storage ServiceSAN, object
BackupOCI Object, S3SAN, NAS
DRDRBD, Array replication, Block volume replicationCephFS, object
SMB, SAMBAZFS oci appliance, Ceph, NAS, Windows File server, FSx, ONTAPFSS, Lustre, Gluster

Why people choose the wrong thing

Most storage disasters are not caused by bad hardware or bad software.
They are caused by the wrong mental model.
People choose storage based on checklists and marketing instead of on how IO actually flows through the system.
Here are the patterns that break most architectures.

Some examples:

Using distributed filesystems for performance

CephFS and Gluster scale in capacity, not in metadata performance. They are great for shared data but less for VM disks and databases.

Using ZFS for everything

ZFS is amazing for reliability. It is not scale out. Teams build huge ZFS appliances and then wonder why performance collapses when hundreds of nodes hit them.

Using Ceph for HPC

Ceph was built for resiliency and object semantics. AI training needs raw bandwidth and predictable IO. Lustre or BeeGFS destroy Ceph here.

Using SAN for backup

Premium flash arrays used to store data that will never be read. That is not architecture. That is waste.


How Oracle OCI fits in this picture

OCI exposes storage the way on premises engineers understand it.

OCI service / applianceReal world equivalent
OCI Block VolumeEnterprise SAN
OCI File StorageNFS NAS
OCI Object StorageS3 tier
OCI ZFS ApplianceHigh end scale up multiprotocol NAS
OCI Local NVMeHPC and database scratch
OCI LustreHPC and AI

OCI does not pretend one tier fits all. That is why it works for real enterprise workloads.


The truth nobody likes to say

Most storage problems in cloud and on premises do not come from bad hardware.
They come from using the wrong type of storage for the workload.
If your platform is slow, unstable or expensive, look at what you are running on top of what. You will usually find a database on a filesystem, VMs on NAS or backups on SAN.

Storage is not one thing. It is a toolbox.
If you keep using a hammer for every job, do not be surprised when everything looks broken.

The part nobody likes to admit

Storage decisions look simple on a slide. In production they decide whether your entire platform is stable or constantly on fire.
Most outages, slowdowns and cost explosions are not caused by bugs. They are caused by the wrong storage architecture chosen months or years earlier.
This is why involving a real storage specialist matters. Someone who understands IO paths, failure domains, recovery behavior and how systems break under load.
When storage is designed correctly, everything above it becomes easier.
When it is designed wrong, no amount of automation or tuning will save you.

In production, storage is not just another component.
It is the foundation that decides whether everything else works or not.

Leave a Reply