Choosing the right data types for variables, your tables, and stored procedures not only improves performance ce by ensuring a correct execution plan, but it also improves data integrity by ensuring that the correct data is stored within a database.
Data types impact data consistency
Say you want to use the datetime data type for a column and storing dates, for example.
Only valid dates will be stored in this column.
However, if you use a character or numeric data type for the column, eventually someone might store any character or numeric data value in the column that does not represent a date.
In the table below, you can find SQL Server data types.
Others | Approximate Numeric | Exact Numeric | Characters | Date/Time | Unicode Characters | Binary |
---|---|---|---|---|---|---|
cursor rowversion hierarchyid uniqueidentifier sql_variant xml spatial geometry types spatial geography types | float real | bigint numeric bit smallint decimal smallmoney int tinyint money | char varchar text | date datetimeoffset datetime2 smalldatetime datetime time | nchar nvarchar ntext | binary varbinary image |
Data types impact SQL Server performance
Data type choice alone can easily mean a 20 times difference in SQL performance.
The specific impact number is not the main concern. The key issue is that it represents a significant and unnecessary hit.
Take a peak at SQL Server Performance Tuning Case Studies we have here. Some were related to data types.
Impact of bad data type choice inside clustered index
Quick reminder: Clustering index (CI) is something you should have on every table. Especially if those tables are important.
Data type choice for fields that go into CI is very important.
The more bytes you can shave off from the clustering index (CI) the better. Shaving off bytes applies to every column.
CI is where this is most important.
To create an index (IX), SQL takes CI cost in bytes, adds IX cost in bytes and that’s the cost to carry new IX in bytes going forward.
Why is SQL data type choice very important for clustering indexes?
Example:
- Say your column is now
NCHAR(20)
. That’s (20*2bytes)+2bytes = 42bytes.
42 bytes is what it takes to have the column available. - Now you want to add an index for a column with int data type, so that’s 4 bytes.
So, IX cost will be 42+4=46 bytes.
There are maybe a few bytes added on top, but that’s outside of this discussion. - Now compare that to if you took an int column, turned that into CI, and added IX you wanted.
The cost would be 4+4=8. That may not seem like much, but times 50 million rows, that comes out as 2,193MB vs 381MB. Or almost 6 times more data storage is needed.
Simply by choosing correct data type, you are saving 6 times the space, or 600%. And that doesn’t end here.
Say you have 300 important tables with 7 indexes each.
How much does that add if this problem was only once in each table? – That’s 800, 100MB, or close to 1TB.
This is an extra terabyte that you have to now carry with you everywhere.
That means that data needs to be maintained, re-indexed, backed up, read into memory and RAM is being wasted, AlwaysOn is now slower, your replication to DR is slower and takes more space, log shipping gets slower, etc.
And I keep on going! This is not it!
Can it get worse?
Sure! Say we index the table by adding another 7 indexes. So, the problem is 7x bigger. It gets worse:
- When SQL is reading data, such as scanning a table—a frequent operation—it requires loading 2,200MB into RAM. This equates to 2,200MB worth of reads from the disk, which is already problematic.
Consequently, 2,200MB of data is expelled from RAM, eliminating the possibility of reusing it and avoiding further disk access. - SQL’s speed is capped by its write speed to the LDF (log file), necessitating 6 times more writing. For every update, insert, or delete operation, all indexes (IX’es) must be updated, multiplying the update load by six for each index.
- Add sending this data around across the network to the client, SQL backups, log shipping, replication, AlwaysOn, maintenance, etc. on top of all this and you see how this snowballs into a big pile of slowness and wasted dollars.
Note: I probably made some math mistakes above, but don’t lose the high-level picture and get bogged down in details.
Conclusion
In SQL Server, data type choices matter! Choose data types wisely, or you’ll drown in SQL Server performance problems and waste lots of dollars!
Agree? Disagree? Comment below!