Few technologies in SQL Server cause as much confusion to a developer than indexes.
Indexes operate “behind the scenes”: adding and removing indexes rarely result in changes to an application’s code.
The proper index can drastically increase the performance of an application. However, there are tradeoffs to almost any feature, and indexes are no exception.
This article outlines several good rules and best practices to apply when creating and modifying indexes for your database.
How it works
The SQL Server Engine uses an index, as a reader uses a book index. Without it, the reader needs to look at every page in a book to find a word.
Indexes are the method by which SQL Server sorts and organizes the rows in tables. It allows you to quickly narrow in on the rows to satisfy the query and avoid scanning every row in the table.
In database terms, a table scan (or just scan) happens when there is no index available to help a query. They have a terrific impact on performance on large tables, but sometimes are inevitable.
A table scan is not always a problem, but as a table grows to millions of rows and beyond, scans become slower and more costly.
Alternatively, a seek is an operation that uses an index to jump to desired the part of a table.
B-Tree index structure
The most common type of index structure in SQL Server is the B-Tree.
The sorted nature of the B-Tree lends itself well to range-based scanning and sequential access as well as equality-based access.
This hierarchical structure has the root node at the top and the leaf nodes at the bottom, as you can see below.
When a query is using an index, the query engine starts at the root node and goes down through the intermediate nodes.
The query engine continues down through the index nodes until it reaches the leaf node.
It can contain a pointer to a row or the entire data, a clustered or nonclustered index.
SQL Server index types
There are two main index types offered by SQL Server: clustered index and nonclustered.
Clustered index
Every table, with almost no meaningful exception, should have a clustered index. It is the method by which SQL Server sorts the actual data.
The clustered index essentially is the table. It contains all of the columns of the table rather than just a subset of columns.
Because the entire table is only stored on disk once and that single copy can only be sorted one way, there can be only one clustered index per table.
This means that queries that use the clustered index to perform their seek will be able to return any and every column they need as they are all stored on the index.
On the downside, any updates to the column(s) that make up the clustered index may cause the entire row to have to be moved to another place in the clustered index.
This behavior turns an update query into a DELETE
followed by an INSERT
, with an evident decrease in performance. On a table with many columns, that can be a very resource-intensive operation.
A clustered index can often be found on the primary key (or a foreign key) column because the values generally do not change once a record is inserted.
Creating a primary key constraint in SQL Server will automatically create a clustered index (if none exists) using the primary key column as the index key.
Sometimes it is better to use a unique nonclustered index on the primary key column and place the clustered index on a column used by more queries.
Nonclustered index
After making a selection for the clustered index, all other indexes need to be nonclustered indexes.
A nonclustered index has fewer columns than the clustered index. It contains only a subset of columns.
When a query uses it for a seek, it will quickly find the row(s) desired. However, a key lookup may be necessary against the clustered index.
A key lookup occurs when SQL Server uses a nonclustered index to satisfy the query’s predicates, but it doesn’t contain all the information needed to cover it.
This may happen when the columns in your select list are not part of the index definition, or an additional predicate is not.
Key Lookups are tied to Nested Loops Joins, executing once for each row that needs to be retrieved. It can be painful, especially in instances of parameter sniffing.
Taking advantage of indexes
One of the query optimizer jobs is to select the best index or indexes to perform the task.
The database engine can use indexes to boost performance in many different queries.
In the following sections, we will review the main types of queries that can benefit from an index.
Matching a WHERE clause
The most common use for an index is in finding a record or set of records matching a WHERE
clause.
Queries can use an index to look for a specific value, as values inside of a range.
For example, the queries below:
DELETE
FROM PersonPhone
WHERE BusinessEntityID = 15
UPDATE PersonPhone
SET Discontinued = 1
WHERE BusinessEntityID > 35
SELECT *
FROM PersonPhone
WHERE BusinessEntityID BETWEEN 15
AND 35
SQL statements like DELETE
, SELECT
, and UPDATE
can all benefit from an index on a table.
Sorting rows
We control the sorting of a dataset by specifying one or more fields in an ORDER BY
clause.
When you send a query to SQL Server, with no index, the database will scan the Products table and sort the rows to process the query.
However, the index provides the database with a presorted list of values.
This way, it can simply scan the index from the first entry to the last entry and retrieve the rows in sorted order.
The index works equally for ASC
or DESC
(by merely scanning the index in reverse).
Grouping records
We may use a GROUP BY
clause to group records and aggregate values.
To process the query with this statement, SQL Server usually sorts the results on the columns included in the GROUP BY
.
It means that Indexing a field used in a GROUP BY
clause can often speed up a query.
Maintaining a unique column
There are several methods to create a unique index.
Marking a column as a primary key will automatically create a unique index on the column.
In fact, every column that requires unique values must have a unique index applied. It is the best tool for the database to use to enforce this rule.
Each time an application adds or modifies a row in the table, the database needs to search all existing records to ensure none of the values in the new data duplicate existing values.
Indexes, as we should know by now, will improve this search time.
Building the best indexes
Now we understand enough why indexes are useful and where indexes are best applied.
Choosing the correct columns and types for an index is another important step.
Consider the following points when creating a useful index for your tables.
Short keys
Be as narrow as possible as possible. Try to limit the index to as few columns.
Having a short index is beneficial for two main reasons:
- The larger an index key is, the harder a database has to work to use the index. Larger index keys will cause the database to perform more disk reads, which limits throughput.
- Since index entries are often involved in comparisons, smaller entries are easier to compare. Choose a compact data type for an index column, based on the number of bytes required.
Avoid using character columns in an index, particularly primary key indexes. Integer columns will always have an advantage over character fields in the ability to boost the performance of a query.
High selective keys
An index with a high percentage of unique values is called a selective index. SQL Server will track statistics about each index, so they know how selective each index is.
That means the indexes with a small percentage of duplicated values are the most effective.
For example, think of a phone book for a town where 80% of the population has the last name of Trump. If we sorted in order of the last name, It would not be beneficial. You can only discount a small number of entries when you are looking for a Trump.
A useful index allows the SQL Server to disregard as many records as possible during a search.
Covering queries with an index
For your most crucial queries, you might consider creating a covering index to give the query the best performance possible.
We call them covered queries because all of the columns requested in the output are contained in the index itself.
You can use a composite index to help match the search criteria of specific queries.
Composite indexes
A composite index is an index on two or more columns.
Both clustered and nonclustered indexes can be composite indexes.
Let’s take a look at the composite index below:
CREATE UNIQUE CLUSTERED INDEX [PK_BusinessEntityID_PhoneNumber] ON [PersonPhone] (
BusinessEntityID
,PhoneNumber
)
The order in which columns appear in a CREATE INDEX
statement is significant. The primary sort order for this index is BusinessEntityID.
When the BusinessEntityID is the same for two or more records, the database will sort this subset of records on PhoneNumber.
The composite index on table PersonPhone is useful in the following two queries:
SELECT *
FROM [PersonPhone]
WHERE [BusinessEntityID] = 10
SELECT *
FROM [PersonPhone]
WHERE [BusinessEntityID] = 10
AND PhoneNumber = '330-555-2568'
Both have the same query plan result, as shown above.
However, the following query cannot take advantage of the index we created since PhoneNumber is the second part of the index key.
SELECT *
FROM [PersonPhone]
WHERE PhoneNumber = '330-555-2568'
In this case, as you can see below, there is a key lookup occurs when SQL Server satisfies the query.
Such an index would probably be a composite index, which appears to go against our first guideline of keeping index entries as short as possible.
This is another tradeoff you can only evaluate with performance testing and monitoring.
Index drawbacks
Indexes can significantly improve query performance in the SQL Server, but they may hamper the performance too.
Data modification
Any time a query modifies records in a table, the indexes must change too, degrading the performance.
You probably shouldn’t have more than 5 indexes per table. However, achieving the right number of indexes will require testing and monitoring of your database.
A database with a massive number of transactions to modify data will need fewer indexes to allow for higher throughput.
On the other hand, in decision support systems and data warehouses that are used heavily for reporting, they can afford more indexes to support the read-only queries.
Disk space
The clustered index in SQL Server does not require any additional disk space. However, each nonclustered index requires additional disk space as it is stored separately from the table.
Usually, this is not a significant concern as disk space is generally cheap enough to trade for more performance. Mainly when a database serves a large number of users.
To see the space required for a table, you can use the stored procedure sp_spaceused, as in the example below.
EXEC sp_spaceused 'PersonPhone'
According to the above output, the table data uses 1184 KB, while the table indexes use 992 KB.
The ratio of index size to table size can vary, depending on the columns, data types, and the number of indexes on a table.
Additional guidelines
- Find the missing indexes:
SQL Server will sometimes suggest an index that it believes will help that query run faster. These are known as missing indexes’ suggestions. Take care, SQL Server is very good at determining when an index is needed, but sometimes it fails. - Drop unused indexes:
Sometimes, an index is never used to improve performance. As we see before, SQL Server has to maintain it in everyINSERT
,UPDATE
, andDELETE
operations. So, why keep this index? - Remove duplicates or overlapping indexes:
Merge several indexes into a single one when possible. Why? Because Indexing is not free.
Conclusion
Indexes are crucial for good performance.
This article should be seen as a starting point, a way to familiarize yourself with the basic concepts of indexing.
You can compensate for a poorly written query with an excellent index. Still, it can be hard to make up for poor Indexing with even the best queries.
From database design to troubleshooting SQL Server problems, our SQL Server experts are on hand to assist.