Introduction
The first thing and the main problem you’re going to face as soon as you start working with databases is SQL query optimization.
You may not notice the issue at first, but as the data grows and the workload increases, queries start performing poorly and taking more time to complete as they’re unable to scale. Why is that? There may be different reasons for it, and through this guide, you will be presented with different problems and solutions for you to check and implement on your own code.
Keep in mind that reducing query execution resource usage to the appropriate level leads to lower infrastructure costs, either directly or indirectly—this is especially important in cloud environments.
You must remember that there is no magic bullet; each query requires a different approach involving one or multiple solutions. However, you don’t need to make a lot of changes at once—improving the top worst performers with an effective approach usually frees enough server resources to handle the remaining queries.
Fundamentals of SQL Query Performance
There are many components utilized during query execution, but they can be summarized in only a few: disk/storage, network, memory, motherboard, and CPU.
If we put speeds at each of them, the list goes as follows:
- Disk/storage: between 200-550 MB/s (SSD), 80-160 MB/s (HDD)
- Network: 125 MB/s (gigabit fiber optic), 32 MB/s (ethernet)
- Memory: 17-25 GB/s
- Motherboard: 984 MB/s – 15 GB/s
- CPU: 175 GB/s
You also need to consider the entire process of retrieving data:
- The database engine first searches the memory.
- If the data isn’t found, it requests the data from storage.
- If there’s not enough memory space, the data in memory must be saved to storage.
- The new data is then loaded into memory.
The slowest part of a query is typically reading data from the disk and transmitting it over the network.
Here, you must account for the physical limitations of storage: delays from mechanical components, lack of a RAID 10 configuration, bandwidth limitations, signal interference, error checking, packet redundancy, and more.
This means the main goal in query optimization is to avoid going down to the storage. In simple terms, it can be achieved as described below.
Help the database engine choose the best execution plan
Only for small tables perform a table scan (whole table), which has a time complexity of O(n). For large tables it´s faster to scan a clustered or non-clustered index (B-tree) with a time complexity of O(log n).
Table partitioning allows reading data from multiple files at once or from multiple disks at once if configured separately.
Partition elimination skips unnecessary data and only scans the files and disks where the data is to be found.
Reduce the amount of information transmitted if it’s not for processing
If you don’t know your data distribution, this article can help you find that out, which allows you to modify your tables and queries to use appropriate data types and lengths.
Using SELECT *
is discouraged since only a few columns need to be displayed or updated.
Using OFFSET
and FETCH
is encouraged unless you cache data at the application layer. Linked servers retrieve the whole table before filtering that out, so it´s better to use filtered openquery / openrowset or Polybase with pushdown. Partitioning very large or very busy tables helps to only process the needed data.
Another problem is outdated statistics, which cause the engine to overestimate the number of rows it needs. You can see this in the execution plan’s estimated number of rows and the accompanying warnings. The less data you retrieve from the disk, the more memory will be available for all queries, reducing the need to access the disk unless absolutely necessary.
Keep the most used data in memory
If you know what your most used data is, the one with the highest number of transactions, and the one where most of the computations occur, then it´s a good idea to keep it in memory.
Relying on the engine will only slow down the process and increase the number of operations moving data in and out of the storage, while you know it should be in memory all the time. This is achieved through memory-optimized tables, memory-optimized indexes, and columnstore indexes.
Another setting that keeps the most used data in memory is Lock Pages in Memory, which can improve performance by avoiding paging data to disk.
Configure the underlying data files appropriately
Many operators create objects in the tempdb
: Index Spool, Eager Spool, Lazy Spool, Spool, Table Spool, and Table-Valued Function. The tempdb
is volatile but it works as any other database: moving data in and out of disk. That’s why having multiple files improves performance allowing more data to flow and more operations to occur at the same time. The newer versions of the SQL Server installer already account for this.
Instant File Initialization is another datafile configuration that improves performance but has a security warning; if the environment is safe enough you can enable this setting. Data files, log files, and tempdb
files must be separated for optimal performance. And datafiles growth must be configured properly.
Use the right number of CPUs
Parallelism is important, as it allows performing multiple operations at the same time producing results faster. But this is an extremely expensive operation, and you don´t want to waste valuable resources on the first query that requests it, especially if it’s unneeded.
Other queries run multiple times per second, and the CPUs they reserve are unavailable to be used by expensive queries that can´t run with fewer CPUs. In all cases, there is too much overhead involved that it slows down your server response. The maximum degree of parallelism is another setting that newer versions of the SQL Server installer account for, but the cost threshold for parallelism must still be changed manually from the default 5.
Filter data using JOINs without making Cartesian product
Let’s start with the simplest model: you have tables for categories, subcategories, and products. If you join the products to the subcategories and categories tables, you must use both keys in the join, as follows:
SELECT CategoryName
,SubcategoryName
,ProductName
FROM Products p
INNER JOIN Subcategories s ON s.SubcategoryId = p.SubcategoryId
AND s.CategoryId = p.CategoryId
INNER JOIN Categories c ON c.CategoryId = s.CategoryId
If you forget either the subcategory or the category in the join, a Cartesian product is done matching all products with all subcategories or categories: if there are 1000 subcategories and 10000 products, the resulting match will have 10 million rows.
Not all models are simple, sometimes it’s hard to pinpoint this issue, so you must rely on the execution plan analysis to determine how many rows are in the input and in the output of all operations, or look at the I/O utilized overall.
Avoid the need to iterate results, think in sets instead
Programmers are often taught early on to write pseudo-code, detailing step by step how a computer should act. However, the sooner you learn how the database engine works, the better. A database isn’t just a file or a repository; you don’t instruct it on how to find the data, you simply tell it what you need, and the engine determines the best approach to retrieve it.
This is accomplished by using the Structured Query Language (SQL) and here you can find a very good introduction. Whenever you feel the urge to use a subquery, cursor, loop, etc., remember that you’re not thinking in terms of sets. Take a step back and try to express the same logic using appropriate set-based constructs from the language. It won’t be easy at first, but it will be rewarding. Over time, you’ll get used to this approach, which will set you apart from other programmers.
Avoid type conversions as they avoid the indexes
When you’re designing a database, it’s common that you forget what type or size you used in a different table, but you will notice it during the query execution because:
- The index created to improve performance is not used.
- The execution plan indicates an implicit type conversion warning.
- You will see a function applied to one of the sides of an expression.
This also occurs when you use Language-Integrated Query (LINQ) or an object-relational mapping (ORM) because it constructs the queries for you, and sometimes you don’t specify well enough the underlying data structure.
This also happens if you apply any kind of cast or function to a field in a table. To avoid this, rewrite the expression on the other side of the query. If you notice these issues, adjust the data types and sizes on the appropriate side of the expression (rather than on the table column) so that they match correctly.
Reduce the number of records blocked during an update
This usually happens when you have heap tables, you don’t have an index on which rows are to be updated, or the index is not selective enough. Again, the execution plan can tell you how many rows are being blocked, so you can act appropriately.
Reduce the amount of time being blocked during an update
Here are three key points:
- Heap tables and indexing: If you have heap tables (without a clustered index) or an index that is not selective enough, the engine may lock most or even the entire table while searching for rows to update. During this time, no other queries can access data from those tables, causing significant delays.
- Number of indexes: The more indexes you define on a table, the longer it takes the engine to update each one with new data. Therefore, it’s recommended to only have the minimum number of necessary indexes to optimize performance.
- Transaction duration: Typically, records are locked at the start of a procedure and unlocked at the end. However, if the system fails, those records may remain locked until the issue is resolved. To avoid this, focus on atomic operations. Lock records only while updating and reading results, then release the lock immediately. While this requires more design time and additional code to validate the row state, it’s the most reliable long-term approach.
Avoid the creation of another execution plan if there is one already available
Every time the database engine needs to create an execution plan, it consumes CPU, memory, and takes a few milliseconds multiplied by the number of executions per second of a query, which can become a problem and cause slowness in all your other queries.
A brief explanation is described in this article, and ways to find them are described in this article, both from Brent Ozar. If you’re using Query Store, how to find them is described in this article from Erik Darling.
After you’ve identified the reason for having multiple plans, you need to either fix your code or do a good parametrization of your SQL code, then the engine won’t have to re-do its work every time.
Avoid extra delays and additional processing
In the previous topic, we’ve seen how to avoid re-doing the calculations to create an execution plan when there is one already available. Another way to avoid extra processing is referencing objects by their two-part name so the computer doesn’t have to search the internal catalogs every time a query runs.
Also, returning the rows ordered is an expensive computation, the same with removing duplicates using DISTINCT
, and the same with formatting values for display: all of this is better done at the application layer.
Views that join multiple tables, contain a large number of rows or use linked servers can be resource-intensive, as they are constructed each time they are queried. In such cases, it’s more efficient to use a materialized view, which stores the data and doesn’t need to be rebuilt on each query. Materialized views can be updated at intervals you specify, reducing the overhead and improving performance.
Another source of extra computations is like the following:
SELECT OrderName
FROM WorkOrders
WHERE DateCreated > GETDATE() - 1
In this case, the date subtraction operation is being performed for every row in the table.
However, if the result will be the same for all rows, why not calculate it in advance? This would prevent unnecessary computations.
The query can be rewritten as follows:
DECLARE @Date DATE
SET @Date = DATEADD(DAY, - 1, GETDATE())
SELECT OrderName
FROM WorkOrders
WHERE DateCreated > @Date
Another common pattern I’ve seen is converting DATETIME
to date, selecting DATEPART
‘s of a date and then joining them, converting char types to VARCHAR
, converting VARCHAR
types to INTEGER
, etc. Very often this is unnecessary as you can rewrite the query using a lot fewer operations and still produce the same output. In this case, it truly applies that “less is more”.
Reduce the number of threads used
This query identifies cases where the number of used threads exceeds the number of reserved threads. This situation indicates potential blocking or malfunctioning hardware, which can lead to performance issues and must be addressed to prevent slowness.
Reduce the amount of memory used
This query identifies instances where memory usage exceeds the planned amount. Scarcity of memory can occur during concurrent executions or due to the high frequency of query executions.
When this happens, it’s a clear sign that you should review the execution plan to determine if an issue is causing the query to consume more memory than expected.
Top optimization techniques and best practices
Now that we’ve outlined the objectives of query optimization, let’s dive deeper into each one with detailed analysis and practical examples.
Don’t rely solely on indexes
Often, you’ll think indexes resolve bad queries which is not true:
- If the code is bad, it’s not going to use the index no matter how hard you try.
- There are so many different index types that you need to choose the most appropriate one. The one less used is full text, even when it accelerates your searches on character fields, but not everyone knows how they work.
- Each index you add will consume space and reduce performance for inserts, updates, and deletes.
- If the index is not selective, which means it must return less than 70% of the underlying table rows, then it’s not useful at all.
- If the
WHERE
condition doesn’t use all the index leading columns, it is not used. - If the
WHERE
condition uses any comparison other than equality, it is not used.
Before deciding whether to create a new index, use the rest of this guide to address other performance issues. After resolving those, evaluate if adding an index is necessary and beneficial.
It’s also important to review the current index design to see if an existing index can be expanded or optimized instead of creating a new one.
ORDER BY
You should ask yourself: why am I returning the data sorted? Sorting is a resource-intensive operation that consumes both memory and CPU, and for large tables, it also incurs significant I/O costs when using tempdb
.
Since you’re paying for each CPU used in the SQL Server instance, those CPUs are better allocated to more critical tasks than sorting results.
Instead, consider sorting the data at the application layer, whether it’s in ASP.NET, PowerShell, PowerBI, Python, or any other platform. On the application server side, you can add as many CPUs and nodes as needed without additional licensing costs, and in cloud environments, you can scale up CPU resources during high workloads.
Appropriate data types
As mentioned earlier, this article can help you identify if you’re using the appropriate data types and lengths:
- If you only have numbers, don’t use a string datatype. Even a
BIGINT
uses only 8 bytes, while a string uses one byte per character. Also, searching on numbers is faster and is always possible, while searching on strings is costly and is not always possible. - If the max value is 290,
SMALLINT
uses only 2 bytes while INT uses 4 bytes. If it is less than 255,TINYINT
uses only 1 byte. - If you only have dates, don’t use a
DATETIME
type (8 bytes), use a date type (3 bytes). - If the number of distinct rows is low, using a
TINYINT
as key is more scalable than having a string one. You can document your database about what each number means, or you can have a catalog table withTINYINT
keys (where searches are performed) and string values (to get the meaning).
Appropriate indexes
As mentioned earlier, this article can help you identify if your indexes will be selective or not:
- If the mode value is more than 10% of the total rows, a filtered index on the mode value is the smallest and most selective.
- If the mode value is more than 90% of the total rows, a filtered index excluding the mode value is the smallest and most selective.
- If the number of distinct rows is small, an index on that column won’t make queries go faster.
- If the number of missing values is high, a filtered index where
IS NOT NULL
is the smallest and most selective. - If the number of missing values is low, a filtered index where
IS NULL
is the smallest and most selective.
Avoid using OR
This is a common operator in programming languages but doesn’t work the same in SQL language.
Instead of this query which is not going to use any index:
SELECT ProductName
FROM Products
WHERE SubcategoryId = 1
OR CategoryId = 2
Rewrite it to use the indexes without incurring overhead from removing duplicates:
SELECT ProductName
FROM Products
WHERE SubcategoryId = 1
UNION ALL
SELECT ProductName
FROM Products
WHERE CategoryId = 2
UNION ALL faster than UNION
As mentioned earlier, UNION
sorts the results and compares them against the second table to remove any duplicates, so this overhead can be avoided if it’s not necessary.
Avoid correlated subqueries
A correlated subquery appears when you want to apply a filtering condition without using sets, but instead trying to tell the engine how to do its work (programmatically). See three distinct examples below:
SELECT ProductName
FROM Products p
WHERE ProductId NOT IN (
SELECT ProductId
FROM WorkOrders
WHERE CategoryId = 1
)
SELECT ProductId
,DatePurchased
,Quantity
FROM WorkOrders wo
WHERE DatePurchased = (
SELECT MAX(DatePurchased)
FROM WorkOrders wo2
WHERE wo2.ProductId = wo.ProductId
)
SELECT ProductName
,(
SELECT SubcategoryName
FROM Subcategories s
WHERE s.SubcategoryId = p.SubcategoryId
) SubcategoryName
FROM Products p
- In the first query, you’re trying to tell the engine to search all work orders that satisfy the condition, and then use that to filter the products.
- In the second query, you’re trying to tell the engine to get the latest date and show the product details on that date.
- In the third query, you’re explicitly requesting the engine to gather additional data row by row.
In every case, you’re thinking programmatically instead of in sets. All queries are slow because they execute the subquery part once for every row in the outer query.
Let’s convert the correlated subqueries to a set-based approach:
SELECT ProductName
FROM Products p
INNER JOIN WorkOrders o ON o.ProductId = p.ProductId
WHERE o.CategoryId > 1
SELECT ProductId
,DatePurchased
,Quantity
FROM WorkOrders wo
INNER JOIN (
SELECT ProductId
,MAX(DatePurchased)
FROM WorkOrders
GROUP BY ProductId
) wo2 ON wo2.ProductId = wo.ProductId
AND wo2.DatePurchased = wo.DatePurchased
SELECT ProductName
,SubcategoryName
FROM Products p
INNER JOIN Subcategories s ON s.SubcategoryId = p.SubcategoryId
Note the magical word is the JOIN
between the tables, so the engine is truly going to do what you intend, but without doing it row by row.
Another common issue is using TOP
with ORDER BY
, which is computationally expensive. Instead, use a set operation to filter the rows you want to display (e.g., using MIN
and MAX
), which is more efficient.
Avoid using cursors
This goes in-line with the previous item: if you try to programmatically do things row by row, you’re putting a toll on CPU, memory, tempdb
, etc. There’s always a set-based approach equivalent to what you’re trying to express, but it requires practice to get used to it.
The benefits in the long run are exceptional, and they’re worth it.
Avoid functions, procedures, views, and dynamic code
This, again, is the result of thinking programmatically.
- You aim to encapsulate logic to avoid having multiple versions of the same code (maintainability).
- You think that if the code is fast, it will perform well even when executed multiple times per second (scalability).
- You want to call the same code from multiple places (reusability).
However, in many cases, the cost of this approach can’t be determined at compile time, so a fixed estimation of rows is used.
- If you include such logic in a
WHERE
condition, an index may not be used. - If you add complex logic, set-based processing is bypassed, and the query is processed row by row. This also inhibits parallelism.
So, whenever you’re overthinking and planning to do it programmatically, step back and think in terms of sets – this is the language of the engine.
Do not SELECT *
It’s the easiest query to write brainlessly.
If you specify the least number of columns that will either, be displayed or processed, you will:
- Reduce I/O, especially by omitting string and
BLOB
columns - Save CPU and memory on columns not to be used
- Use the appropriate index based on the information to be used; otherwise, the engine chooses to scan the whole table because that’s where all the columns you’re requesting exist.
Limit results with OFFSET and FETCH
This is especially useful in paginated queries, where you’re not going to display a thousand records to the user so there’s no need to get all rows every time the user refreshes the page. See an example below:
SELECT ProductId
,ProductName
FROM Products
ORDER BY ProductName OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY
In this case, the ORDER BY
is necessary because the engine needs to provide the results consistently, otherwise you omit the order.
Handle ISNULL in the SELECT part
This aligns with the earlier recommendation to avoid using functions in the WHERE
clause, as doing so can prevent the use of indexes. Remember:
NULL
represents the absence of a value (similar to “does not exist”),0
indicates an exhausted quantity and an empty string means there is no label.
Additionally, avoid using the same procedure to return all rows and a filtered subset of rows (e.g., WHERE ISNULL(@Param1, 0) = 0
) because the engine uses different optimized methods for each scenario.
Use equality, not inequality
When you use the operators =
, >
, and <
, the index defined on that column is utilized.
The only exception is the not-equal operator <>
. This makes sense because, with <>
, the engine must read all rows and discard those that don’t match, rather than efficiently determining which rows to retrieve based on the index.
Use LIKE ‘string%’, not LIKE ‘%string’
This is similar to the previous point: when you place a wildcard character at the beginning of a string (e.g., %value
), the index cannot be used. However, if the wildcard is not the first character (e.g., value%
), the index can be used to narrow down the search area. A more efficient approach, though, is to use full-text search, which allows searching for text in any position and order, providing better performance for such queries.
Use sp_executesql, not execute
This is because sp_executesql
allows passing parameters to dynamic queries, which helps create a single execution plan for the query, reducing both resource usage and execution time. If different parameters significantly alter the execution plan, you can use dynamic queries within the same stored procedure to handle them separately, optimizing performance.
Avoid using DISTINCT
Using DISTINCT
forces an extra sort operation; instead, try to filter out the duplicate rows by using appropriate joins.
Use SET NOCOUNT ON
This is the easiest fix to use and is most used in stored procedures.
It won’t change the number of affected rows; rather, it simply avoids an additional counter and reduces extra network traffic when returning this information. When you multiply this by the number of executions per second, the optimization becomes more significant.
Avoid views
We’ve mentioned earlier one problem with views, but there are more:
- The estimated number of rows is inaccurate
- The resulting rows are calculated when the view is used, which has a cost and takes time
- It may not be used but instead its underlying tables and indexes
So, if you’re facing performance issues with views, consider adding the following:
- Recreate them using
WITH SCHEMABINDING
for an accurate estimated number of rows - Convert it into a materialized view by creating a clustered index, so the data exists all the time
- Reference the view using
WITH(NOEXPAND)
for it to be chosen first, but if it is slower then better leave the database engine to decide what to use.
Avoid complex queries
This is the best advice you can get, as it emphasizes that debugging, reusability, and maintainability are only achievable with simpler queries.
It’s recommended to break complex queries into smaller parts, storing intermediate results in table variables, temporary tables, or common table expressions.
This way, performance can be measured and optimized in those smaller parts, allowing for easier improvements when necessary.
Avoid large write operations
When updating large numbers of rows, the transaction log and tempdb
can grow excessively until all operations are committed, causing slowness when they exceed available memory.
However, committing too frequently can make the query run even slower. It’s best to test different batch sizes to find the optimal balance for your data volume and the maximum size you want for the log and tempdb
. Also, keep in mind that during these operations, the tables are locked, so it’s often necessary to run them during offline hours or a planned outage (e.g., a release window).
How to find slow queries in Microsoft SQL Server
There are several free tools used to find performance issues in SQL Server, but do you know how to use them?
Usually, what you’re looking for from these tools is:
- Plan-level warnings
- Operator-level warnings: columns with no statistics, spill to
tempdb
, no join predicate, etc. - Missing index notifications
- Scans in general, like table scan
- Seek predicates with a range of values
- The cardinality estimation model version must be appropriate for the SQL Server version you’re running: 160 for Azure SQL and SQL Server 2022, 150 for SQL Server 2019, 140 for SQL Server 2017, 130 for SQL Server 2016
- Appropriate trace flags: i.e. trace flag 4199 enables query optimizer hotfixes (non-default).
- Wait statistics
- Number of rows read vs actual number of rows
- Estimated number of rows vs estimated number of rows to be read
Below is a list of tools provided with SQL Server ordered from easier to harder and the link that shows you how to use them; learning them just requires you to practice their usage:
- SSMS includes the actual execution plan
- DMVs
- SSMS Live Query Statistics
- SSMS Query Plan Analyzer
- SSMS Query Plan Comparison
- SSMS Activity Monitor
- SSMS Performance Dashboard
- SSMS Performance Reports
- Database Engine Tuning Advisor
- Query Store
- Query Tuning Assistant
- Extended Events
- Profiler
- SQLDiag
- RML Utilities
- ReadTrace and SQL Nexus Analysis Queries
Conclusion
You may have noticed that performance tuning is not an easy task; it requires both practice and a deep understanding of what to look for and how to measure improvements.
When you get a call in the middle of the night or during a holiday, resolving these issues can be especially challenging, as they often require code changes.
That’s why I recommend monitoring performance during normal workloads, so you can identify potential problems early and know what can be resolved quickly.
By regularly performing a database health checklist, you can address most general performance issues and focus solely on the real critical problems when they arise.