You are currently viewing Ceph is amazing. Just don’t ask it to be Lustre

Ceph is amazing. Just don’t ask it to be Lustre

(And Yes, It Can Still Power AI and SAN Workloads)

Ceph is one of the most powerful open-source storage platforms available today.

It offers object, block, and file storage in a single distributed system, with high availability, strong durability, and the ability to scale on standard hardware.
That alone makes Ceph exceptional.

But here is the uncomfortable truth:

Ceph can scale massively and still be the wrong storage for a specific workload.

Understanding why is the difference between a great architecture and a frustrating one.


Ceph scales. The benchmarks are real.

Recent official benchmarks published by the Ceph project show very strong results, especially for object storage:

  • near-linear scaling as nodes are added
  • more than 100 GiB/s aggregate read throughput
  • tens of GiB/s write throughput
  • hundreds of thousands of parallel operations

These are not synthetic numbers. They come from real clusters, using fast NVMe, modern CPUs, and high-speed networking.

Reference:
https://ceph.io/en/news/blog/2025/benchmarking-object-part1/

So let’s be clear:

Ceph is not slow.
A well-designed Ceph cluster can handle massive parallel I/O.

Ceph performance is architecture-dependent

Ceph works thanks to a few key ideas:

  • a fully distributed core (RADOS)
  • automatic data placement with CRUSH
  • no single point of failure
  • strong consistency and self-healing

Reference:
https://docs.ceph.com/en/latest/architecture/

This design gives Ceph huge flexibility but also means that hardware, network, and configuration matter a lot.

Two Ceph clusters running the same version can behave very differently if:

  • network bandwidth is limited
  • CPU is shared with too many services
  • pools mix different workloads
  • recovery traffic is not controlled

Ceph does not hide complexity.
It exposes it.

Ceph for SAN and AI: Yes, it can work

Ceph is often labeled as “object storage only”.
That is simply not true.

Ceph also provides:

  • RBD (block storage), used as SAN-like storage
  • CephFS, a shared POSIX file system

Reference:
https://docs.ceph.com/en/latest/rbd/
https://docs.ceph.com/en/latest/cephfs/

In real deployments, Ceph is successfully used for:

  • virtualization platforms
  • Kubernetes persistent volumes
  • databases
  • AI pipelines
  • shared storage for many clients

With proper design, fast network, enough CPU per OSD and clean pool separation, Ceph can deliver stable and predictable performance, even for demanding workloads.

Including SAN-like ones.

Where Ceph needs more care

Problems usually appear with workloads that are:

  • heavy on metadata
  • full of small files
  • tightly synchronized
  • very sensitive to latency variation

This is common in AI training.

And this is where comparisons with Lustre become important.


Ceph vs Lustre: parallel is not the same as synchronized

From the outside, AI workloads look “parallel”.
In practice, many of them are synchronized.

Ceph: excellent at parallel, independent access

Ceph shines when:

  • many clients access different data
  • operations are independent
  • throughput matters more than latency consistency

This makes Ceph ideal for:

  • AI datasets
  • preprocessing and feature extraction
  • checkpoints
  • data lakes

This behavior is clearly shown in the official benchmarks.
https://ceph.io/en/news/blog/2025/benchmarking-object-part1/

Lustre: designed for synchronized workloads

Lustre was built for HPC and large training clusters, where:

  • thousands of processes run together
  • jobs move forward in lock-step
  • metadata operations happen in bursts
  • one slow I/O can block many GPUs

Lustre handles this well because:

  • metadata is central to the design
  • data and metadata paths are optimized separately
  • performance degrades more predictably under load

This is why Lustre is widely used in:

  • supercomputers
  • GPU farms
  • large AI training environments

Why this difference matters for AI

When GPUs wait, raw throughput is not enough.

In synchronized training:

  • tail latency matters more than peak bandwidth
  • metadata storms slow down entire jobs
  • small delays multiply across thousands of workers

In these scenarios:

  • Ceph can work
  • Lustre often works better

Not because Ceph is weak but because Lustre was designed for exactly this pattern.

Ceph and Lustre are not competitors, they are complements

The real mistake is trying to force one system to do everything.

Many modern AI architectures use:

  • Ceph for data lakes, object storage, checkpoints
  • Lustre for active training datasets

Each system plays to its strengths.

Ceph remains a spectacular platform:

  • scalable
  • flexible
  • resilient
  • cost-effective

Lustre remains the reference when:

  • training is tightly synchronized
  • metadata pressure is high
  • predictable performance is critical

Final Takeaway

Ceph can sustain SAN-like and AI workloads, if it is well designed and well configured.

The benchmarks prove that Ceph can scale very far.
But scaling numbers alone do not replace architecture.

Ceph is powerful.
Ceph is flexible.
Ceph is not a shortcut.

Used correctly, it is not just “good enough”. It is one of the best storage platforms available today.

Leave a Reply