An Azure VM instance type caps total disk throughput across all attached disks, so fast provisioned disks get silently throttled until you move to a VM with a higher throughput ceiling.
- The VM-level throughput cap overrides per-disk IOPS settings, and although Azure exposes disk throughput metrics and limits, nothing raises an explicit alert when your provisioned disks exceed the VM cap.
- Switching to an E5_BDS instance gave 4x the disk throughput for $400 to $500 more per month and beat the original $30,000/month RAID servers.
- Disk latency dropped from 300 milliseconds to around 37 milliseconds after the instance change.
- To check your own VMs, total the provisioned throughput across every attached disk and compare it to the instance type’s max uncached disk throughput.
The Starting Point: $30,000/Month in SQL Server Azure Cost With Zero Optimization
This client was running SQL Server on Azure with hosting costs of around $30,000 per month.
The infrastructure was built to handle peak workloads with no optimization applied afterward. The team specced out what they thought the environment needed, built the servers, and let them run for years.
The storage configuration was impressive. Custom RAID arrays built on the fastest disks available in Azure. From a database perspective, the setup was well-engineered. IOPS were strong. The RAID configuration increased overall throughput by 2 to 4x compared to standalone disks.
But the environment was over-provisioned. The servers had far more capacity than the workloads required. And nobody had reviewed the configuration since it was originally deployed.
Our goal was to cut SQL Server Azure cost by 50 to 70%. That meant moving off the expensive RAID arrays, right-sizing the instances, and optimizing the queries running on them.
PCI DSS compliance requirements meant every change needed approval chains that stretched weeks. This project started in August and was still moving through approvals months later. Azure SQL cost optimization in regulated industries takes patience. The ROI is there, but the compliance process does not move at the speed of technology.
The First Attempt: Standard Disks. It Broke.
The first move was replacing the custom RAID arrays with standard Azure disks. On paper, this was straightforward. The RAID arrays were expensive. Standard disks with high IOPS provisioning should handle the workload at a fraction of the cost.
It did not work.
The standard disks could not deliver results in time. Database responsiveness dropped. The system was hammering the disk subsystem and struggling to keep up. End users were waiting for query results that previously returned in acceptable timeframes.
The RAID arrays had been masking inefficiencies in the workload. Poorly optimized queries that generated excessive IO were running fine because the storage was fast enough to brute-force through them. Remove that raw storage power, and those same queries choked the system.
Storage configuration details like disk sector alignment also compound these issues — a misconfigured disk adds latency before you even touch VM-level caps.
The Second Attempt: Upgraded Disks. Hit a Different Wall.
We upgraded to a faster disk type. Performance improved, but not enough. Queries were still slower than expected. The disk IOPS should have been sufficient based on the provisioned configuration.
This is where we found the problem that most teams miss.
Azure VM instance types have a maximum aggregate disk throughput. This is a hard cap on the total bandwidth available across all attached disks combined.
Here is what that means in practice.
You configure a single disk with 5 gigabits per second of throughput. Looks good in the Azure portal. But your VM instance type has a total throughput cap of 1.5 gigabits per second across all disks. And you have other disks attached that are already consuming part of that bandwidth.
Your disk will never reach the promised performance. The VM-level cap silently throttles everything.
This is not surfaced prominently in the Azure portal. You configure each disk individually. The settings look correct. The provisioned IOPS look right. But the VM instance type is the bottleneck, and nothing in the standard monitoring dashboards flags it.
You can provision 4 gigabits on one disk, but if the VM throughput cap is 4 gigabits total and your other disks are already consuming bandwidth, that single disk will never get what you paid for.
The Fix: Change the VM Instance Type
We changed the VM to an E5_BDS instance type. This provided four times the disk throughput capacity compared to the previous instance type.
The cost increase was $400 to $500 per month.
The resulting Azure SQL database performance exceeded the original expensive RAID-based servers.
Read that again.
$400-500 per month more on the VM instance type, and the new environment outperformed the setup that cost tens of thousands per month in SQL Server Azure pricing.
Disk latency, which had been spiking to 300 milliseconds on the worst-performing disks, dropped to around 37 milliseconds. Queue length, previously growing to 250, stabilized. CPU utilization showed significant headroom on the new instances.
One of the two original servers was decommissioned. The client is now running two new servers with one old server kept as a backup during the transition period. The final cutover is expected within weeks.
Performance Tuning on the Smaller Footprint
With the infrastructure stabilized, we moved into the next phase: query optimization on the right-sized environment.
The results from index optimization were immediate and significant.
On one query, a single index reduced IO reads from approximately 8.5 million to 319,000. That index took about 20 minutes to create.
On another query, a single index dropped IO reads from 38 million to 5,500. This fix also required changing a WHERE condition in the application code. Average query duration dropped instantly after implementation.
These are not marginal improvements. IO reads dropping from 38 million to 5,500 is a 99.98% reduction. The Azure SQL database performance impact was immediate — the database is doing almost no unnecessary work on that query path.
And this is where Azure SQL cost optimization compounds. The right-sized infrastructure costs less. The optimized queries need fewer resources. Which means the infrastructure can be right-sized again. Each cycle delivers additional savings on a smaller base cost.
We’ve seen similar compounding effects in other cases — for example, fixing a single data type issue saved $50K+ in hardware costs by eliminating unnecessary IO at the query level.
We estimate the next round of downsizing will save this client an additional $30,000 per year once the current performance tuning work is fully implemented.
Why This Pattern Repeats Across Azure SQL Environments
This is not an isolated case. The pattern shows up in almost every SQL Server Azure cost project we run.
- Step one: the environment is over-provisioned because it was built for peak capacity and never reviewed.
- Step two: someone tries to downsize by swapping to cheaper storage or a smaller instance. Performance breaks because the optimization was never done to make the workload efficient enough for a smaller infrastructure.
- Step three: the team assumes the workload needs the expensive infrastructure and gives up.
The missing step is always the same.
Tune the workload first, or at least understand the infrastructure constraints before making changes. The Azure VM throughput limit is one example of a constraint that is invisible until you hit it. Disk latency, memory pressure, and CPU throttling all have similar gotchas that don’t surface in basic monitoring. In another engagement, a licensing and architecture review saved a client $117K/year on Azure SQL — a different root cause, the same pattern of unchecked infrastructure spend.
Workload validation before and after changes is critical. We use distributed replay and workload testing on Azure Hyperscale migrations to catch exactly these kinds of performance regressions before they reach production.
How to Check Your Own Environment
If you’re running SQL Server on Azure VMs, this is a 15-minute check worth doing today.
- Look up your VM instance type in the Azure VM sizes documentation. Find the “Max uncached disk throughput” and “Max uncached disk IOPS” columns.
- Add up the provisioned throughput across every attached disk — OS, data, log,
tempdb. - If the total provisioned throughput exceeds the VM-level cap, your disks are being throttled. You’re paying for IOPS you can’t use.
- Check whether a different VM instance type with higher throughput would cost less than the over-provisioned disk IOPS you’re currently running.
If you’d like us to run this check on your environment, our SQL performance tuning services include a full infrastructure and query review — we identify exactly where your Azure SQL spend is going and what to cut first.
Frequently Asked Questions
How do I know if my Azure SQL environment is over-provisioned?
Can I downsize Azure SQL without performance testing first?
What’s the difference between disk-level IOPS and VM-level throughput limits?
How long does a SQL Server Azure cost optimization project take in a regulated industry?
Should I optimize queries before or after right-sizing the infrastructure?
What is a realistic SQL Server Azure cost reduction target?
Speak with a SQL Expert
In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide operational peace of mind