Maximizing Storage I/O Performance for Microsoft SQL Server workloads on Oracle OCI IaaS.

Running Microsoft SQL Server on Oracle OCI IaaS can deliver excellent performance, but only if the storage layer is designed and validated correctly. Too often, performance issues are attributed to the database engine while the real bottleneck sits in an improperly configured or untested storage setup.

In this article, I walk through a storage configuration specifically designed for SQL Server workloads on Oracle OCI, focusing on block volume layout, attachment type, and I/O characteristics. The setup is not theoretical: it has been tested using FIO benchmarks built to realistically simulate SQL Server I/O patterns, including random reads and writes, queue depth behavior, and concurrency.

The goal is simple and practical: show which storage choices actually matter, quantify their impact with measurable data, and provide a repeatable approach to validate I/O performance before going into production. All benchmark results are included and discussed, so you can directly compare configurations and understand the trade-offs instead of relying on assumptions.

This article builds on concepts already discussed in previous posts, starting from a critical but often misunderstood topic: what “guaranteed IOPS” actually means for enterprise workloads. Guaranteed IOPS represent a minimum capability of the storage service under ideal conditions, not a promise that real-world applications will constantly reach those numbers. As explained in the first article, database workloads like SQL Server introduce randomness, mixed read/write patterns, variable block sizes, and queue depth behavior that can significantly diverge from synthetic benchmarks and marketing figures.

In addition, achieving high and consistent storage performance on Oracle OCI is not just a matter of selecting a high-performance block volume tier. As detailed in the second article, maximum disk I/O throughput on Windows instances depends on a combination of factors: instance shape network bandwidth, attachment model, number of block volumes, striping strategy, and how well the operating system can drive parallel I/O. In other words, performance is an architectural outcome, not a single configuration switch.

These two topics form the foundation for the analysis presented here. Understanding the limits of guaranteed IOPS and knowing how to unlock the available throughput at the OS and platform level are prerequisites before evaluating SQL Server storage performance. The FIO benchmarks shown later in this article are designed precisely to validate these principles under realistic SQL Server–like workloads, using measurable data instead of assumptions.

Enough theory. Let’s move straight to the numbers and compare storage performance using FIO simulation tests across different storage configurations, each aligned with the Oracle OCI IaaS shapes evaluated in this analysis.

The FIO command used to generate the workload is designed to simulate a realistic stress test for a large enterprise SQL Server database. For simplicity, I did not separate the log files onto a different volume, but I created multiple files with different block sizes to emulate concurrent and parallel I/O requests.

Uses windowsaio with direct I/O to bypass cache.
Generates time-based I/O for 300 seconds.
Simulates multiple files:
- 3 x SQL data files: random read/write, 70% reads, 8K blocks, moderate queue depth and parallel jobs.
- 2 x tempdb files: random read/write, 50% reads, 8K blocks, lower concurrency.
- 1 x SQL log file: sequential writes only, 64K blocks, lower depth.
Multiple jobs and threads emulate concurrent database workloads.

Basically a realistic, mixed SQL Server stress test covering data, tempdb, and log I/O.


fio --name=sql-data1 --filename=datafile1.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=70 --bs=8K --numjobs=6 --iodepth=8 --size=40G --time_based --runtime=300 --group_reporting --thread ^--name=sql-data2 --filename=datafile2.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=70 --bs=8K --numjobs=6 --iodepth=8 --size=40G --time_based --runtime=300 --group_reporting --thread ^--name=sql-data3 --filename=datafile3.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=70 --bs=8K --numjobs=4 --iodepth=8 --size=30G --time_based --runtime=300 --group_reporting --thread ^--name=tempdb1 --filename=tempdb1.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=50 --bs=8K --numjobs=4 --iodepth=8 --size=20G --time_based --runtime=300 --group_reporting --thread ^--name=tempdb2 --filename=tempdb2.dat --ioengine=windowsaio --direct=1 --rw=randrw --rwmixread=50 --bs=8K --numjobs=2 --iodepth=4 --size=10G --time_based --runtime=300 --group_reporting --thread ^--name=sql-log --filename=sql-log.ldf --ioengine=windowsaio --direct=1 --rw=write --bs=64K --numjobs=2 --iodepth=4 --size=20G --time_based --runtime=300 --group_reporting --thread

Results:

single BV_VPU120_PARAVIRTUALIZED Download

single BV_VPU120_VFIO Download

3striped BV_VPU120_VFIO Download

3striped BV_VPU120_PARAVIRTUALIZED Download

Test Environment:

Test 1:
Shape: VM.Optimized3.Flex with 16 OCPUs and 48 GB RAM, with 1 Block Volume of 1.5TB 120 VPUs attached, tested both paravirtualized and with SR-IOV. WINDOWS 2022 server standard OS

Test 2:
Shape: VM.Optimized3.Flex with 16 OCPUs and 48 GB RAM, with 3 Block Volumes of 1.5TB 120 VPUs each, attached and striped in a pool, tested both paravirtualized and with SR-IOV. WINDOWS 2022 server standard OS

Storage setup:
A) 1 Block Volume – Paravirtualized (VPU120)
B) 1 Block Volume – SR-IOV (VFIO) (VPU120)
C) 3 Block Volumes striped – Paravirtualized (VPU120)
D) 3 Block Volumes striped – SR-IOV (VFIO) (VPU120)

These tests were performed using up to 3 block volumes in a striped pool, but depending on the production workload, this number may be increased further to handle heavier loads. However, it should be considered that adding more volumes in striping increases CPU overhead and also the risk of failure, even if this risk is very low since Block Volume is a service guaranteed by Oracle. Therefore, before provisioning 12 striped volumes, carefully evaluate your workload to determine the minimum number of volumes required.

Recommended max latencies for SQL Server

Average Latency (ms)	Category	Impact on SQL Server Load	Notes
< 1 ms	Excellent	No impact, extremely responsive	Ideal for NVMe or enterprise SSD volumes in high-intensity OLTP setups
1 – 2 ms	Good	SQL Server load handled smoothly	Suitable for medium OLTP/OLAP, perfect for VMs with SSD-optimized storage
2 – 5 ms	Acceptable	Possible minor delays on intensive queries	Suitable for light OLTP or test environments
5 – 10 ms	Poor	Noticeable latency, may slow transactions	Only useful for batch jobs or non-critical reports
> 10 ms	Critical	Performance severely degraded, risk of timeouts	Not suitable for OLTP production, only for backup or archival
> 20 ms	Unsustainable	Database could hang or encounter errors	Must be avoided in production

TABLE 1 – DIRECT COMPARISON OF ALL 4 SCENARIOS

Metric	A) 1BV PARAVIRT	B) 1BV SR-IOV	C) 3BV PARAVIRT	D) 3BV SR-IOV
READ BW (MiB/s)	770	714	1055	1025
READ IOPS (k)	98.6	91.4	135	131
WRITE BW (MiB/s)	501	464	675	656
WRITE IOPS (k)	53.9	49.9	73.5	71.4
Latency READ avg (µs)	1072.5	1164.0	779.4	803.1
Latency READ p99 (µs)	1926	2245	1532	2024
Latency READ p99.9(µs)	2671	2704	2376	6259
Latency WRITE avg (µs)	1201.4	1298.8	886.5	912.1
Latency WRITE p99 (µs)	3294	3458	2507	3097
Latency WRITE p99.9(µs)	5276	5735	4146	6456
CPU usr %	1.69	1.67	3.40	3.25
CPU sys %	8.14	8.10	17.93	17.24
Total IO R (GiB)	226	209	309	300
Total IO W (GiB)	147	136	198	192

Even in the worst configuration (Config D) in terms of p99.9 latency, the server maintains excellent average latencies for SQL Server, so normal load will be handled without issues. Only the p99.9 peaks reach critical territory, which could cause brief slowdowns on very intensive queries or under maximum tempdb stress. Overall, the performance is solid and suitable for OLTP/OLAP production environments.

TABLE 2 – DELTA: SR-IOV vs PARAVIRTUALIZED (same number of block volumes)
Baseline = Paravirtualized
BW/IOPS: positive = better | Latency: positive = worse

Metric	1 BV SR-IOV vs 1 BV PARAVIRUALIZED	3 BV striped SR-IOV vs 3 BV striped PARAVIRUALIZED
READ BW	-7.3%	-2.8%
READ IOPS	-7.3%	-3.0%
WRITE BW	-7.4%	-2.8%
WRITE IOPS	-7.4%	-2.9%
Lat. READ avg	+8.5% (worse)	+3.0% (worse)
Lat. READ p99	+16.6% (worse)	+32.1% (worse)
Lat. READ p99.9	+1.2% (≈ same)	+163.4% (MUCH worse)
Lat. WRITE avg	+8.1% (worse)	+2.9% (worse)
Lat. WRITE p99	+5.0% (worse)	+23.5% (worse)
Lat. WRITE p99.9	+8.7% (worse)	+55.7% (worse)

Note: SR-IOV is ALWAYS worse than Paravirtualized across all tests.

TABLE 3 – DELTA: 3 BV STRIPED vs 1 BV (same driver)
Baseline = 1 Block Volume
BW/IOPS: positive = better | Latency: negative = better

Metric	Paravirtualized 3BV vs 1BV Paravirtualized	SR-IOV 3BV vs 1BV SR-IOV
READ BW	+37.0%	+43.6%
READ IOPS	+36.9%	+43.3%
WRITE BW	+34.7%	+41.4%
WRITE IOPS	+36.4%	+43.1%
Lat. READ avg	-27.3% (better)	-31.0% (better)
Lat. READ p99	-20.5% (better)	-9.8% (better)
Lat. READ p99.9	-11.0% (better)	+131.5% (MUCH worse)
Lat. WRITE avg	-26.2% (better)	-29.8% (better)
Lat. WRITE p99	-23.9% (better)	-10.4% (better)
Lat. WRITE p99.9	-21.4% (better)	+12.6% (worse)

Note: striping always helps in the mid range (avg/p99), but under SR-IOV the p99.9 tails explode.

TABLE 4 – FINAL RANKING (best → worst per key metric)

Metric	1st (best)	2nd	3rd	4th
READ BW	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
WRITE BW	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
READ IOPS	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
WRITE IOPS	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
Lat. READ avg	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
Lat. READ p99	C 3BV PARA	A 1BV PARA	D 3BV SRIOV	B 1BV SRIOV
Lat. READ p99.9	C 3BV PARA	A 1BV PARA	B 1BV SRIOV	D 3BV SRIOV
Lat. WRITE avg	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
Lat. WRITE p99	C 3BV PARA	D 3BV SRIOV	A 1BV PARA	B 1BV SRIOV
Lat. WRITE p99.9	C 3BV PARA	A 1BV PARA	B 1BV SRIOV	D 3BV SRIOV

Configuration C (3 BV striped + Paravirtualized) wins across ALL metrics!!

Additional Considerations:

The latency perceived by SQL Server also depends on IOPS and workload type (random read/write vs sequential).
For mixed workloads (e.g., highly active tempdb), a stable latency below 2 ms is strongly recommended.
OLAP batch workloads can tolerate higher latencies, while OLTP with many concurrent transactions requires minimal latency.

ANALYSIS AND RECOMMENDATIONS

1) STRIPING: always worth it
Moving from 1 to 3 Block Volumes in a striped configuration delivers a consistent gain:

BW and IOPS increase by 35-44 % in both read and write
Average latency drops by 26-31 %
p99 percentiles improve by 10-24 %
Exception: tail latency (p99.9) under SR-IOV degrades drastically.

2) SR-IOV vs Paravirtualized: Paravirtualized is better

SR-IOV bandwidth and IOPS lower by 3-7 %
Average latency slightly higher (+3-8 %)
p99 and p99.9 noticeably worse, especially with 3 BV striped
Possible causes: driver optimization, VFIO overhead, or Windows-level IO complexity.

3) The p99.9 “bomb” under SR-IOV + 3 BV striped

3BV SR-IOV R p99.9 = 6259 µs (+163% vs 3BV PARA 2376 µs)
3BV SR-IOV W p99.9 = 6456 µs (+55% vs 3BV PARA 4146 µs)
Critical for SQL Server transactional workloads; can stall query batches.

FINAL RECOMMENDATION

Recommended configuration to maximize performance and latency: 3 Block Volumes striped + Paravirtualized driver

Benefits:

Highest bandwidth (1055 R / 675 W MiB/s)
Highest IOPS (135k R / 73.5k W)
Lowest average latency
Most stable tail latency (p99.9)
No risk of exploded latency spikes as seen under SR-IOV.

Avoid:

SR-IOV + 3 BV striped
Single BV (any driver): limits bandwidth and IOPS with no latency benefits.

CPU note:

Striping doubles the sys CPU load (from ~8 % to ~18 %).
On a VM dedicated to the DB this is not an issue, but it should be monitored if the VM shares cores with other workloads.

Suggestion:

It is recommended not to run agents or perform operations like backups or volume clones during production hours or peak periods, as they may significantly impact database performance, potentially causing errors and SQL Server malfunctions due to the additional, artificially introduced latency.

Please check previous posts if you want all the background on this topic.

How to reach the maximum disk I/O throughput with Windows OS instances on OCI

What “Guaranteed IOPS” Really Means for Enterprise Workloads

At the time of publishing this post, I am an Oracle employee, but he views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Please Share This Share this content

You Might Also Like

How to establish an IPSEC Site2Site VPN to OCI Oracle Infrastructure with Mikrotik routers.

Optimizing HPC Storage: Lustre Design Strategies for Oracle OCI

Leave a Reply Cancel reply

Share this content