[Case Study] Azure Managed Instance Distributed Replay test

Problem Summary:

One of the biggest challenges in creating an Azure Managed Instance is determining the resource limits.
In this case study, we use the Microsoft guidelines and the SQL Server Distributed Replay feature to assess a SQL Server migration and choose the right service tier.

The Distributed Replay Test

Current Production (Source Server):

  •  48 CPU cores;
  • 200GB RAM Memory;
  • SQL Server 2008 R2 (SP1) – Enterprise Edition

Azure Managed Instance (Target Server)

  • Gen 5, 4 vCores;
  • 20,4GB RAM Memory;
  • A SQL Managed Instance always runs the latest stable SQL engine version, equal to or higher than the latest available RTM version of SQL Server.

Workload Replay

  • Collector: Profiler
  • Duration: 8 hours
  • Collection date: 02/07/2020
  • #of commands:  1,981
  • #of executions:  10,741,98

We executed a sampled workload.  The criteria used were replaying the workload that happened every 30 minutes for 10 to 15minutes.

Wait observed during Replay (Target)

Table 1 – Distributed replay wait types on target.

As we can see in the table above, 67% of waits faced by SQL Server are related to lack of Memory to accommodate the cache needed.  Considering that the current prod has 200GB of Memory and the Azure Managed instance only has 20GB, this explains the bad performance result below for CPU and duration.

The Azure managed instance needs more Memory and CPU to present the safe performance visible in the current prod.

Replay Results (Target)

Table 2 – Azure MI results (target).

Before we draw any conclusion, it is essential to keep in mind that we are comparing a 48 core – 200GB RAM Server with an Azure managed instance that under the hood has only 4 cores and 20GB.

As per the table above, the new server presented a worse CPU and duration even though it had much better reads.  The better reads are justified by the query engine with a decade of improvements compared to the current prod version.

The poor CPU and duration are expected because the number of CPU cores is much lower in the Azure managed instance.

Conclusion

The replay test was essential to figure out the main bottlenecks. In short, after some tests, we precisely recommended upgrading the Azure managed instance server (MI) tier, increasing the number of vCores to 16 and the Memory to 82GB.

Finally the migrations went very well! Without problems. And most importantly, without nasty surprises.

See more SQL Server Performance Optimization examples here.

We’d love to talk to you about any questions you may have about your SQL Upgrade or Migration project specifics. 

We tune slow SQL Servers every day. For over a DECADE now. Contact us!

Mark Varnas

Mark Varnas

Hey I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable. I channel my SQL into our SQL Managed Services, SQL Consulting and our internal database products.

Leave a Reply

Your email address will not be published. Required fields are marked *