When it comes to AI and machine learning applications, data throughput can become the deciding factor between success and failure. Lustre’s architecture is designed to offer unmatched performance across large-scale clusters, and the achievable throughput can be optimized based on the hardware configuration—specifically, the type of storage drives used and the network infrastructure in place.
Throughput on Different Storage Drives
1. HDDs (Hard Disk Drives)
For installations that use traditional HDDs, throughput generally peaks between 100 to 200 MB/s per drive. While HDDs are cost-effective and offer sufficient storage capacity, their speed can limit Lustre’s overall performance, especially when used in data-intensive applications. Clusters with multiple HDDs configured in RAID can increase throughput, but the gains remain modest compared to solid-state options. In high-throughput applications, using only HDDs might cause bottlenecks, especially in AI tasks requiring rapid sequential read/write capabilities.
2. SSDs (Solid-State Drives)
SSDs significantly increase Lustre’s throughput, delivering between 400 to 500 MB/s per drive. With SSDs, Lustre’s I/O operations become much faster, enhancing performance for AI and ML tasks that need high-speed data access, such as model training and large-scale simulations. SSDs’ low latency also reduces wait times between data requests, providing a smoother data pipeline. When configured with NVMe (Non-Volatile Memory Express), SSDs can reach 3 GB/s or more per drive, making them ideal for demanding applications in deep learning or real-time analytics.
3. NVMe Drives
NVMe drives further elevate Lustre’s throughput, achieving speeds up to 6 GB/s per drive in optimal configurations. For AI workloads involving huge datasets, NVMe drives enable the highest level of performance, minimizing delays and accelerating data transfer. In addition, NVMe’s superior parallelism complements Lustre’s design, which supports high concurrency across distributed nodes. For example, in a deep learning environment with large image datasets, NVMe-backed Lustre can manage multiple simultaneous data accesses without lag, ensuring continuous model training without bottlenecks.
Network Infrastructure and Lustre Throughput
Beyond storage drives, network infrastructure significantly impacts Lustre’s performance. Lustre relies on a fast and stable network to maintain its high throughput across nodes:
1. 1 Gigabit Ethernet (1GbE)
Although 1GbE is economical, it quickly becomes a limiting factor in Lustre deployments. With a maximum theoretical throughput of 125 MB/s, a 1GbE network can bottleneck performance, even with high-speed SSDs or NVMe drives. This setup is typically insufficient for AI workloads, where the data demands outpace what 1GbE can offer.
2. 10 Gigabit Ethernet (10GbE)
With 10GbE, Lustre can achieve up to 1.25 GB/s in throughput. This bandwidth allows for more effective use of SSDs in Lustre nodes, providing sufficient throughput for mid-sized AI workloads and complex simulations. AI teams working on medium-sized datasets can benefit from 10GbE, as it supports rapid data ingestion and processing at a much higher rate than 1GbE.
3. InfiniBand (IB)
For the most demanding workloads, InfiniBand offers speeds up to 200 Gbps, with real-world throughput often exceeding 20 GB/s. When used with NVMe-backed Lustre nodes, InfiniBand unlocks the full potential of both the storage and the Lustre file system. AI applications with extreme throughput requirements, such as deep learning on massive datasets, benefit greatly from InfiniBand. For example, a large research organization could handle high-resolution image data streams in real-time, allowing multiple researchers to access and process data simultaneously without performance degradation.
Lustre’s Aggregate Throughput: Cluster-Level Performance for HPC Workloads
One of the standout benefits of Lustre is its ability to scale throughput across an entire cluster, allowing performance gains far beyond what single-server storage solutions like NFS can provide. In high-performance computing (HPC) environments, Lustre’s architecture enables the aggregation of throughput from multiple Object Storage Targets (OSTs), forming a single, high-speed parallel file system. This approach not only enhances data access but also effectively minimizes the bottlenecks that often arise when relying on single-endpoint storage like NFS.
Aggregate Throughput in Lustre Clusters
In a Lustre deployment, each node contributes to the cluster’s overall throughput, allowing cumulative data access speeds that can reach hundreds of gigabytes per second. For example:
– HDD-Based Clusters
A Lustre cluster using traditional HDDs across multiple OSTs might achieve an aggregate throughput of 5-10 GB/s, depending on the number of drives and network configuration. While slower than SSD or NVMe-based clusters, this setup still provides substantially more throughput than a single-node NFS configuration, as it combines the read/write speeds of all nodes.
– SSD-Based Clusters
With SSDs, Lustre’s aggregate throughput can reach dozen of GB/s across multiple OSTs. For AI workflows, this increased throughput translates into faster data loading and processing, crucial for training models that require continuous data access.
– NVMe-Based Clusters
In top-tier HPC configurations, NVMe-backed Lustre clusters can achieve aggregate throughput of 100 GB/s or more, with some setups even reaching hundreds of GB/s in cutting-edge research facilities. This ultra-high throughput enables rapid processing of massive datasets, ensuring that even the most data-intensive AI applications run without delay. For instance, organizations running simulations on genomic data or climate models can process massive amounts of data in real-time, making NVMe and Lustre a formidable combination for HPC environments.
Single-Node Throughput vs. Cluster-Level Aggregation
In a traditional storage setup using NFS, throughput is limited to the capabilities of a single server. This creates a bottleneck, as all clients accessing data are constrained by the maximum speed of that endpoint. In contrast, each Lustre node contributes to the overall data pipeline, allowing clients to access data in parallel from multiple OSTs. As a result:
– Single Node Throughput
While individual nodes in a Lustre setup (e.g., an OSS with SSDs) might deliver 400-500 MB/s per drive, multiple OSS nodes can be combined to exponentially increase the throughput. In an AI workload scenario, each OSS provides independent access paths, meaning clients don’t have to compete for data access, as they would in an NFS setup.
– Cluster-Level Throughput
By leveraging the combined throughput of numerous OSS nodes, a Lustre cluster can handle tens to hundreds of gigabytes per second. For example, if each OSS node in a Lustre setup can independently provide 5 GB/s, a 20-node cluster could theoretically deliver an aggregate throughput of 100 GB/s. This kind of scalability is essential for HPC and AI tasks that need to read and write large files quickly.
How Lustre Avoids Bottlenecks Compared to NFS
1. Parallel Data Access
Unlike NFS, where data flows through a single endpoint, Lustre distributes files across multiple OSTs. Clients can access data from these OSTs in parallel, avoiding the central bottleneck inherent in NFS architectures. This parallelism is critical in AI and machine learning environments, where multiple nodes may need to access large datasets simultaneously.
2. High Concurrency and Scalability
Lustre’s scalability allows it to support thousands of nodes, each capable of contributing to the data flow. This high level of concurrency is especially beneficial in HPC, where simulations or deep learning models may need to access large amounts of data at the same time. An NFS server, limited by single-point throughput, would struggle to keep up with these demands, leading to latency and throttled performance.
3. Resiliency and Redundancy
With Lustre, data is striped across multiple storage nodes, providing not only high-speed access but also redundancy. If one node is overloaded or temporarily unavailable, data can still be accessed from other nodes without significant performance drops. This resiliency further mitigates bottlenecks and ensures continuous, high-speed data access even during peak loads.
Benefits of Using Lustre for AI and HPC Workloads
In AI environments, a high-throughput Lustre setup ensures that data is readily available, reducing time spent waiting for data loads and increasing time spent processing and analyzing data. With NVMe and InfiniBand, Lustre can provide the speed and efficiency that drive faster model training, streamline data processing, and enhance the performance of even the most complex AI workflows.
For AI training, data analytics, and HPC simulations, Lustre’s performance advantages over NFS are considerable. By removing single-server limitations, Lustre ensures a consistent data throughput that scales with the demands of the workload. AI models that require vast datasets for training, or HPC simulations running complex computations, benefit from Lustre’s ability to streamline data access, reduce latency, and deliver aggregate throughput that far surpasses traditional storage solutions.
In essence, Lustre provides the high throughput and scalability needed to support cutting-edge AI and HPC environments, minimizing bottlenecks and ensuring that data access can keep pace with even the most demanding computational tasks.
Continue reading the series of posts on Lustre: Lustre Design Strategies on Oracle OCI
Resources:
At the time of publishing this post, I am an Oracle employee, but he views expressed on this blog are my own and do not necessarily reflect the views of Oracle.
Pingback: Unlocking High-Performance Storage with Lustre: A Guide for AI, HPC, and Data-Intensive Workloads – Marco Santucci