Using the least used SQL feature: Distributed Replay (DReplay)
There are a lot of things that can go wrong during SQL Server migration.
One of them is random T-SQL calls, which become slower.
And they do.
Here is an actual example where we tested two SQL Server configurations before SQL Upgrade and Migration. In this case, the tested settings did worse than the old SQL Server.
So SQL Server Distributed Replay is very valuable at catching high-risk scenarios like the ones shown in the video below.
Why does SQL performance become worse after SQL Server migrations?
Why does it happen even if the NEW server has faster storage, more RAM, more CPU cores, and better everything?
It’s because changes in hardware and SQL version can lead SQL Server to choose different execution plans. And that “different execution plan”, can cause your T-SQL to run insanely slower.
Granted, that doesn’t happen to a lot of T-SQL calls.
But we see about 90% of T-SQL calls running faster on NEW server, and 10% going slower.
I AM LYING.
On large enterprise-level SQL Server, we see 20-40 SQL calls that will become A LOT slower.
Why does SQL performance improve without any code changes after migrating to the latest version of SQL?
- It could be that hardware or Win OS or both were set up with all best practices and proper settings on the NEW server.
- It could be that SQL Server was set up with all the best practices on the NEW server.
- It could be that you’re jumping up a few SQL Server versions and simply, SQL has just gotten better.
- Or it can be any combination of the above.
How much does SQL Server performance improve by jumping several versions?
When upgrading from SQL 2008 to SQL 2016-2017, we often see, SQL performance improving by about 30-40%.
Without changing a single line of code. We set up hardware, Windows, and SQL for best performance – of course, and often old server wasn’t.
Now, I know those best practices do, improve performance, but I can’t take credit for the full 30-40%.
The majority of the improvement is because SQL Server just got better.
Better statistics, smarter code, and better “magic dust” are areas where Microsoft still has a few tricks up its sleeve, all of which contribute to improved execution plans and, consequently, faster SQL performance.
But it’s not all good.
During migrations, we see some T-SQL code becoming slower!
Wait… What???
Yes, some T-SQL calls become noticeably slower.
Discovering random slowness is not something you want to experience after migrating. It’s particularly troubling to find out that an important query that used to take half a second now takes 30 seconds.
Ummh, not cool, buddy. OK?! Not on my critical DB systems.
You probably thinking, “I need to know what may run slower”, right?
Absolutely.
Then the question is:
How do you know prior to migration what SQL will run slower?
The answer: Distributed Replay (or DReplay).
Which must be one of the least used SQL features under the sun.
How can Distributed Replay help SQL Server migrations?
DReplay allows to capture YOUR (notice YOUR) production workload from the existing server (I like to call OLD server). And then re-run it on the NEW server.
Sounds simple, right? Not exactly.
There are a ton of steps involved. And I am not going to bore you with the steps here. But I will show the result you can expect.
The important thing is that you will be able to say things like:
- The NEW server is faster by 27.8% compared to OLD.
- Here are the queries that became faster on the NEW vs. OLD.
- Here are the queries that became slower on the NEW vs. OLD.
So this is how I make slower queries run faster.
And we don’t have to deal with performance emergencies after the migration to the NEW server is over. We can all go to bed after the migration window is over. And not look like a dummy who did not plan things right for the next week in the office. 😊