You are currently viewing Maximizing Storage I/O Performance for Microsoft SQL Server workloads on Oracle OCI IaaS.

Maximizing Storage I/O Performance for Microsoft SQL Server workloads on Oracle OCI IaaS.

Running Microsoft SQL Server on Oracle OCI IaaS can deliver excellent performance, but only if the storage layer is designed and validated correctly. Too often, performance issues are attributed to the database engine while the real bottleneck sits in an improperly configured or untested storage setup.

In this article, I walk through a storage configuration specifically designed for SQL Server workloads on Oracle OCI, focusing on block volume layout, attachment type, and I/O characteristics. The setup is not theoretical: it has been tested using FIO benchmarks built to realistically simulate SQL Server I/O patterns, including random reads and writes, queue depth behavior, and concurrency.

The goal is simple and practical: show which storage choices actually matter, quantify their impact with measurable data, and provide a repeatable approach to validate I/O performance before going into production. All benchmark results are included and discussed, so you can directly compare configurations and understand the trade-offs instead of relying on assumptions.

This article builds on concepts already discussed in previous posts, starting from a critical but often misunderstood topic: what ā€œguaranteed IOPSā€ actually means for enterprise workloads. Guaranteed IOPS represent a minimum capability of the storage service under ideal conditions, not a promise that real-world applications will constantly reach those numbers. As explained in the first article, database workloads like SQL Server introduce randomness, mixed read/write patterns, variable block sizes, and queue depth behavior that can significantly diverge from synthetic benchmarks and marketing figures.

In addition, achieving high and consistent storage performance on Oracle OCI is not just a matter of selecting a high-performance block volume tier. As detailed in the second article, maximum disk I/O throughput on Windows instances depends on a combination of factors: instance shape network bandwidth, attachment model, number of block volumes, striping strategy, and how well the operating system can drive parallel I/O. In other words, performance is an architectural outcome, not a single configuration switch.

These two topics form the foundation for the analysis presented here. Understanding the limits of guaranteed IOPS and knowing how to unlock the available throughput at the OS and platform level are prerequisites before evaluating SQL Server storage performance. The FIO benchmarks shown later in this article are designed precisely to validate these principles under realistic SQL Server–like workloads, using measurable data instead of assumptions.

Enough theory. Let’s move straight to the numbers and compare storage performance using FIO simulation tests across different storage configurations, each aligned with the Oracle OCI IaaS shapes evaluated in this analysis.


The FIO command used to generate the workload is designed to simulate a realistic stress test for a large enterprise SQL Server database. For simplicity, I did not separate the log files onto a different volume, but I created multiple files with different block sizes to emulate concurrent and parallel I/O requests.

  • Uses windowsaio with direct I/O to bypass cache.
  • Generates time-based I/O for 300 seconds.
  • Simulates multiple files:
    • 6 x SQL data files: random read/write, 70% reads, 8K blocks, moderate queue depth and parallel jobs.
    • 2 x tempdb files: random read/write, 50% reads, 8K blocks, lower concurrency.
    • 1 x SQL log file: sequential writes only, 64K blocks, lower depth.
  • Multiple jobs and threads emulate concurrent database workloads.

Basically a realistic, mixed SQL Server stress test covering data, tempdb, and log I/O.


fio --name=sql-data1 --filename=datafile1.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=70 --bs=8K --numjobs=6 --iodepth=8 --size=40G --time_based --runtime=300 --group_reporte2.dat --ioengine=windowsaio --direct=1 odepth=8 --size=40G --time_based --runtime=3=6 --iodepth=8 --size=40G --time_based --runtime=3=6 --iodepth=8 --size=40G --time_based --runtime=300 --group_reporting --thread ^--name=sql-data3 --filename=datafile3.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=70 --bs=8K --numjobs=4 --iodepth=8 --size=30G --time_based --runtime=300 --group_reporting --thread ^--name=tempdb1 --filename=tempdb1.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=50 --bs=8K --numjobs=4 --iodepth=8 --size=20G --time_based --runtime=300 --group_reporting --thread ^--name=tempdb2 --filename=tempdb2.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=50 --bs=8K --numjobs=2 --iodepth=4 --size=10G --time_based --runtime=300 --group_reporting --thread ^--name=sql-log --filename=sql-log.ldf --ioengine=windowsaio --direct=1 --rw=write --bs=64K --numjobs=2 --iodepth=4 --size=20G --time_based --runtime=300 --group_reporting --thread

Test Environment:

Test 1:
Shape: VM.Optimized3.Flex with 16 OCPUs and 48 GB RAM, with 1 Block Volume (120 VPUs) attached, tested both paravirtualized and with SR-IOV. WINDOWS 2022 server standard OS

Test 2:
Shape: VM.Optimized3.Flex with 16 OCPUs and 48 GB RAM, with 3 Block Volumes (120 VPUs each) attached and striped in a pool, tested both paravirtualized and with SR-IOV. WINDOWS 2022 server standard OS

Storage setup:
A) 1 Block Volume – Paravirtualized (VPU120)
B) 1 Block Volume – SR-IOV (VFIO) (VPU120)
C) 3 Block Volumes striped – Paravirtualized (VPU120)
D) 3 Block Volumes striped – SR-IOV (VFIO) (VPU120)

Recommended max latencies for SQL Server

Average Latency (ms)CategoryImpact on SQL Server LoadNotes
< 1 msExcellentNo impact, extremely responsiveIdeal for NVMe or enterprise SSD volumes in high-intensity OLTP setups
1 – 2 msGoodSQL Server load handled smoothlySuitable for medium OLTP/OLAP, perfect for VMs with SSD-optimized storage
2 – 5 msAcceptablePossible minor delays on intensive queriesSuitable for light OLTP or test environments
5 – 10 msPoorNoticeable latency, may slow transactionsOnly useful for batch jobs or non-critical reports
> 10 msCriticalPerformance severely degraded, risk of timeoutsNot suitable for OLTP production, only for backup or archival
> 20 msUnsustainableDatabase could hang or encounter errorsMust be avoided in production

TABLE 1 – DIRECT COMPARISON OF ALL 4 SCENARIOS

MetricA) 1BV PARAVIRTB) 1BV SR-IOVC) 3BV PARAVIRTD) 3BV SR-IOV
READ BW (MiB/s)77071410551025
READ IOPS (k)98.691.4135131
WRITE BW (MiB/s)501464675656
WRITE IOPS (k)53.949.973.571.4
Latency READ avg (µs)1072.51164.0779.4803.1
Latency READ p99 (µs)1926224515322024
Latency READ p99.9(µs)2671270423766259
Latency WRITE avg (µs)1201.41298.8886.5912.1
Latency WRITE p99 (µs)3294345825073097
Latency WRITE p99.9(µs)5276573541466456
CPU usr %1.691.673.403.25
CPU sys %8.148.1017.9317.24
Total IO R (GiB)226209309300
Total IO W (GiB)147136198192

Even in the worst configuration (Config 4), the server maintains excellent average latencies for SQL Server, so normal load will be handled without issues. Only the p99.9 peaks reach critical territory, which could cause brief slowdowns on very intensive queries or under maximum tempdb stress. Overall, the performance is solid and suitable for OLTP/OLAP production environments.

TABLE 2 – DELTA: SR-IOV vs PARAVIRTUALIZED (same number of block volumes)
Baseline = Paravirtualized
BW/IOPS: positive = better | Latency: positive = worse

Metric1 BV SR-IOV vs 1 BV PARAVIRUALIZED3 BV striped SR-IOV vs 3 BV striped PARAVIRUALIZED
READ BW-7.3%-2.8%
READ IOPS-7.3%-3.0%
WRITE BW-7.4%-2.8%
WRITE IOPS-7.4%-2.9%
Lat. READ avg+8.5% (worse)+3.0% (worse)
Lat. READ p99+16.6% (worse)+32.1% (worse)
Lat. READ p99.9+1.2% (ā‰ˆ same)+163.4% (MUCH worse)
Lat. WRITE avg+8.1% (worse)+2.9% (worse)
Lat. WRITE p99+5.0% (worse)+23.5% (worse)
Lat. WRITE p99.9+8.7% (worse)+55.7% (worse)

Note: SR-IOV is ALWAYS worse than Paravirtualized across all tests.


TABLE 3 – DELTA: 3 BV STRIPED vs 1 BV (same driver)
Baseline = 1 Block Volume
BW/IOPS: positive = better | Latency: negative = better

MetricParavirtualized 3BV vs 1BV ParavirtualizedSR-IOV 3BV vs 1BV SR-IOV
READ BW+37.0%+43.6%
READ IOPS+36.9%+43.3%
WRITE BW+34.7%+41.4%
WRITE IOPS+36.4%+43.1%
Lat. READ avg-27.3% (better)-31.0% (better)
Lat. READ p99-20.5% (better)-9.8% (better)
Lat. READ p99.9-11.0% (better)+131.5% (MUCH worse)
Lat. WRITE avg-26.2% (better)-29.8% (better)
Lat. WRITE p99-23.9% (better)-10.4% (better)
Lat. WRITE p99.9-21.4% (better)+12.6% (worse)

Note: striping always helps in the mid range (avg/p99), but under SR-IOV the p99.9 tails explode.


TABLE 4 – FINAL RANKING (best → worst per key metric)

Metric1st (best)2nd3rd4th
READ BWC 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
WRITE BWC 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
READ IOPSC 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
WRITE IOPSC 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
Lat. READ avgC 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
Lat. READ p99C 3BV PARAA 1BV PARAD 3BV SRIOVB 1BV SRIOV
Lat. READ p99.9C 3BV PARAA 1BV PARAB 1BV SRIOVD 3BV SRIOV
Lat. WRITE avgC 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
Lat. WRITE p99C 3BV PARAD 3BV SRIOVA 1BV PARAB 1BV SRIOV
Lat. WRITE p99.9C 3BV PARAA 1BV PARAB 1BV SRIOVD 3BV SRIOV

Configuration C (3 BV striped + Paravirtualized) wins across ALL metrics!!

Additional Considerations:

  • The latency perceived by SQL Server also depends on IOPS and workload type (random read/write vs sequential).
  • For mixed workloads (e.g., highly active tempdb), a stable latency below 2 ms is strongly recommended.
  • OLAP batch workloads can tolerate higher latencies, while OLTP with many concurrent transactions requires minimal latency.

ANALYSIS AND RECOMMENDATIONS

1) STRIPING: always worth it
Moving from 1 to 3 Block Volumes in a striped configuration delivers a consistent gain:

  • BW and IOPS increase by 35-44 % in both read and write
  • Average latency drops by 26-31 %
  • p99 percentiles improve by 10-24 %
    Exception: tail latency (p99.9) under SR-IOV degrades drastically.

2) SR-IOV vs Paravirtualized: Paravirtualized is better

  • SR-IOV bandwidth and IOPS lower by 3-7 %
  • Average latency slightly higher (+3-8 %)
  • p99 and p99.9 noticeably worse, especially with 3 BV striped
    Possible causes: driver optimization, VFIO overhead, or Windows-level IO complexity.

3) The p99.9 “bomb” under SR-IOV + 3 BV striped

  • 3BV SR-IOV R p99.9 = 6259 µs (+163% vs 3BV PARA 2376 µs)
  • 3BV SR-IOV W p99.9 = 6456 µs (+55% vs 3BV PARA 4146 µs)
    Critical for SQL Server transactional workloads; can stall query batches.

FINAL RECOMMENDATION

Recommended configuration to maximize performance and latency: 3 Block Volumes striped + Paravirtualized driver

Benefits:

  • Highest bandwidth (1055 R / 675 W MiB/s)
  • Highest IOPS (135k R / 73.5k W)
  • Lowest average latency
  • Most stable tail latency (p99.9)
  • No risk of exploded latency spikes as seen under SR-IOV.

Avoid:

  • SR-IOV + 3 BV striped
  • Single BV (any driver): limits bandwidth and IOPS with no latency benefits.

CPU note:

Striping doubles the sys CPU load (from ~8 % to ~18 %).
On a VM dedicated to the DB this is not an issue, but it should be monitored if the VM shares cores with other workloads.


Please check previous posts if you want all the background on this topic.

Leave a Reply