Problem Summary
One of the biggest challenges in migrating to Azure Managed Instance is determining the resource limits.
We successfully used Microsoft SQL Server Distributed Replay feature to assess a SQL Server migration and choose the right service tier. Which allowed us to size it properly and not overspend on the instance.
Test
Server 1 – Current Production (or Source Server)
- 48 CPU cores;
- 200GB RAM Memory;
- SQL Server 2008 R2 (SP1) – Enterprise Edition
Server 2 – Azure Managed Instance (or Target Server)
- Gen 5, 4 vCores;
- 20GB RAM Memory;
- A SQL Managed Instances always run the latest stable SQL engine version, equal to or higher than the latest available RTM version of SQL Server.
Test Workload Used in Replay
- Collector: Profiler
- Duration: 8 hours
- #of commands: 1,981
- #of executions: 10,741,98
We executed a workload that was captured on Server1. We replayed the workload every 30 minutes, each session lasting 10 to 15 minutes.
Waits Observed During Replay on Server2 (or a Target Server)
67% of SQL Server waits are due to insufficient memory for caching.
Given that the current production server (Server1) has 200GB of RAM and the Azure Managed Instance (Server2) only has 20GB, the poor CPU and duration performance makes sense.
The Azure Managed Instance requires more memory and CPU to match the performance of the current production server.
Replay Results on Server2 (or a Target Server)
Duration (ms) | CPU (ms) | Disk (8k page reads) | ||||
---|---|---|---|---|---|---|
Prod | New | Prod | New | Prod | New | |
Overall | 401 | 2,891 | 623,722 | 1,108,220 | 83,002,898 | 51,446,643 |
AVG | 400 | 2,891 | 623,722 | 1,108,220 | 83,002,898 | 51,446,643 |
CPU usage
Overall duration
Disk
Before drawing any conclusions, remember we are comparing a server with 48 cores and 200GB RAM to an Azure Managed Instance with only 4 cores and 20GB RAM.
As shown in the table above, the new server had worse CPU performance and longer duration despite having better reads. The improved reads are due to a decade of query engine advancements compared to the current production version.
The poor CPU performance and longer query durations are expected due to the Azure Managed Instance having significantly fewer hardware resources.
Conclusion
The replay test was crucial for identifying the main bottlenecks were and identifying new server hardware sizing.
After additional DReplay testing, we chose to go with Azure Managed Instance tier of 16 vCores and 82GB of RAM. This configuration allowed us to achieve the same performance on the new server without overspending on hardware resources as on Server1 (Source server).