How to troubleshoot a slow running query using wait stats in the MS SQL server?

SQL Server Performance Tuning is a vast topic. Tuning can can get complicated and struggle is real.

Fixing a slow query and solving performance problems can involve hours of research and testing.

Today I will show you how you can use the query wait statistics to troubleshoot a slow queries.

Query wait statistics

Microsoft added a new DMV, sys.dm_exec_session_wait_stats, in SQL Server 2016. It provides information about waits for an active session.

This ability of SQL Server to monitor wait statistics from a query execution (what query waits for and how long at each step of execution) is a powerful and precise tool for determining the root causes of the slow query execution.

How to find out what your query is waiting on?

Query Wait Statistics in Execution Plan

One significant enhancement in the SQL server 2016 SP1 was to query execution plans.

Thenceforth,  you can see the wait statistics information of the query inside the actual execution plan of that query.

To be able to take benefits of the SQL execution plan, we should be grated SHOWPLAN permission to generate the execution plan.
Members of the SYSADMIN, DBCREATOR, or DB_OWNER will be able to create the Execution Plans without troubles.

Runtime Metrics

To see these runtime metrics in action, let’s use the query below, which usually takes a few seconds to complete.

SELECT TOP (10000) a.*
From master..spt_values a,
master..spt_values b
Order by a.Number DESC, b.number DESC
Figure 1 – Query plan for the query sample.

This time, the query completed the execution in 31 seconds. There are several points of attention here, which will be disregarded to simplify the example.

As you can see above, the Sort operation has 97% of the total cost of this operation, but it takes only 2.331 seconds to complete.

Where was spent the rest of the time execution time? Let’s figure out!

If I choose the “Select” and go into the properties of the operator, we will find the WaitStats.

Figure 2 – SELECT Operator’s properties.

In the screenshot above, after expanding all nodes,  we can quickly figure out that there is a problem.

At the bottom, the warning node within WaitType “MemoryGrant” is 28 seconds (Note that the other wait types are in milliseconds).

Over the years, I have seen this to be one of the most common memory-related issues.

Applications often execute seemingly simple queries that end up wreaking tons of performance havoc on the SQL Server side because of huge sort or hash operations.

These not only end up consuming a lot of SQL Server memory during execution but also cause other queries to have to wait for the memory to become available (thus the performance bottleneck).

In short, you can use this valuable information to guide the depth-in investigation in the right direction. Using DMVs, Perfmon counters, and other tools, you can investigate which queries are large-grant consumers and can have those tuned/re-written where possible.

Query Wait Statistics in Extended Events

Previous to SQL Server 2016, it is slightly more challenging to get this kind of information. If you are using SQL 2012 or 2014, you can set an extended event capture.

Unfortunately, you have to know the session_id for the query, and you have to set up data capture in advance – which isn’t trivial when you’re in the middle of a high priority issue.

So, let me show you how to set up an extended event to capture the same information:

  1. On SSMS, go to extend events and create a new session (wizard).
    Figure 3 – Creating a new extend event session wizard.
  2. Capture an event called wait_info. It will show the waits and their duration.
    Figure 4 – Choosing the Event to capture.
  3. You need to ensure that you filter this information. Thus, you will want to add Global Fields to filter and reduce the overhead on the server and the amount of data collected.
    Figure 5 – Choosing Global Fields to capture.
  4. In this case, I am setting the value 52 (my session_ID) on events filters.
    Figure 6 – Setting the session Event filter.
  5. Right-click on the result and remove the columns that you don’t need (keep only the name, wait_type, and duration).
    Figure 7 – Adding columns to the result set.
  6. On the next step, group the events by wait_type (on Grouping tab).
    Figure 8 – Grouping by wait_type.
  7. And finally, aggregate them by SUM (Aggregation tab).
    Figure 9 – SUM Aggregation.

Alternatively, you can also create an extended event capture using the script below.
Just remember to replace the value for the variable [sqlserver].[session_id].

CREATE EVENT SESSION [Query_wait_stats] ON SERVER 
ADD EVENT sqlos.wait_info(
    ACTION(sqlserver.session_id,sqlserver.sql_text)
    WHERE ([sqlserver].[session_id]=(52)))
ADD TARGET package0.ring_buffer
WITH (STARTUP_STATE=ON)

Another recommended way to deal with query wait stats is to use the wait statistics captured by the Query Store. You can access more information here.
The only drawback to this method is that you have to be running Azure SQL Database or SQL Server 2017 onwards.

There was a time when this was actually kind of difficult. However, now we have a lot of different tools to capture query wait statistics.
As you can see, the Waits are one of the most powerful tools in your troubleshooting toolbox.

The learning curve is short and steep, so if you’re not already using them, now is the time to start.

Make SQL Performance Tuning someone else’s problem. 
We’d love to help!

Mark Varnas

Mark Varnas

Hey I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable. I channel my SQL into our SQL Managed Services, SQL Consulting and our internal database products.

Leave a Reply

Your email address will not be published. Required fields are marked *