Guide on how to compare performance of two SQL Servers – Part 2

(and the end result we want out of DReplay testing)

In Part1, I talked about:

  1. Why would you compare performance of two SQL Servers.
  2. And end result that we want to get by doing this comparison.

In big picture, here is what I am doing:

  1. Capturing SQL workload on OLD. I will do this on the busiest day of the week and will capture 4 hours worth of workload.
  2. Re-run that exact same workload on the NEW.

There will be a bunch of steps in between. But that’s what I am doing.

Here is the end result that I will get.

The End result of Distributed Replay Testing
The End result of Distributed Replay Testing

It shows:

  1. SQL call made.
  2. How many times the sucker was run on the OLD and NEW.
  3. And then shows CPU, Storage and Duration numbers. OLD vs. NEW and the difference.

In reality all these numbers are for me. Not for the customer. Customer only cares about the Yellow part – how long did SQL call take to run.

But if I can reduce down CPU and Storage numbers, that means server will have more capacity, Storage will not be under so much pressure and there will be room to breathe.

Let’s define what these columns mean:

  1. Column A: “SQL Text”. This SQL that was executed. There are few things done to SQL Text, like removing parameters, all capitalized, spaces adjusted, etc. – I will be aggregating (group by) all numbers based on that. I need SP call “EXEC SP1 @lastname = ‘jones’”, be same as “EXEC SP1 @lastname = ‘smith’”.
  2. Column B: “OLD – number of executions”. Since we rerunning workload from OLD to NEW, this show how many times this SQL call was made. On OLD server.
  3. Column C: “NEW – number of executions”. How many times this call was made on NEW server. Column B & C are sanity checks – they show number of executions and if they are not the same or very close, it means I did something wrong and these numbers can not be trusted. Re-do.
  1. Column D: “OLD – Avg CPU”. CPU Cycles it took on average to process this SQL Text. On OLD Server.
  2. Column E: “NEW – Avg CPU”. CPU Cycles it took on average to process this SQL Text. On NEW Server.
  3. Column F: “DELTA – Avg CPU”. The difference between OLD and NEW CPU cycle consumption.
  1. Column G: “OLD – Avg Storage Reads”. Storage operations it took to process this SQL Text. On OLD.
  2. Column H: “NEW – Avg Storage Reads”. Storage operations it took to process this SQL Text. On NEW.
  3. Column I: “DELTA – Avg Storage Reads”. The difference in storage operations between OLD vs. NEW.
  1. Column J: “OLD – Avg Duration”. The Duration it took for this SQL Text to run. On OLD.
  2. Column K: “NEW – Avg Duration”. The Duration it took for this SQL Text to run. On NEW.
  3. Column L: “DELTA – Avg Duration”. The difference in Duration OLD vs. NEW.

This is what the bottom of the Excel looks like:

DReplay - Summary
DReplay – Summary

Here we have TOTAL results of running workload on NEW vs. OLD.

Not bad for not changing a single line of code!

These are typical numbers we are seeing by the way. Upgrade your SQL Server to the newest version (anything above SQL 2016) and that’s what you get.

Totals are for the big picture. Will my SQL Server run faster or slower on the NEW server? Here, it’s a “Hell Yes!”

As analytical DBA, like the detail level. I want to sort this Excel based on the last column “DELTA – Avg Duration”. And find SQL calls that became worse. Prevent the disaster. Then I take them, and figure out what can be done to improve them. Maybe it’s simply an index change. Or stored procedure re-write. Either way, it needs to be addressed.

I can see database backups are a lot slower on the NEW. I will drill down and look into why.

Also (line 8) one stored proc is considerably slower and it runs 481 times in 4h. So it’s a big deal. Will look into that next.

DReplay - the most important column
DReplay – the most important column

And this is how it’s done.

If you want to migrate the SQL Server and hope that the things will just work on the NEW one. Sure, go for it.

If you have databases that are mission critical, where errors or slow downs affect a lot of things, spending a bit of time on thoroughly testing may be worth it.

This method removes risk from SQL Server migrations and upgrades.

This is not for everybody though. Simpler SQL Servers don’t need this.

There you have it. Now you know what type of result you can get out of Distributed Replay.

Agree? Disagree? Write a comment.

Mark Varnas

Mark Varnas

Hey I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable. I channel my SQL into our SQL Managed Services, SQL Consulting and our internal database products.

Leave a Reply

Your email address will not be published. Required fields are marked *