Choosing the right data types for variables, your tables, and stored procedures not only improves performance by ensuring a correct execution plan, but it also improves data integrity by ensuring that the correct data is stored within a database.
Data types impacts data consistency
Say you want to use DateTime data type for a column and storing dates, for example. Only valid dates will be stored in this column. However, if you use a character or numeric data type for the column, then eventually, someone will be able to store any type of character or numeric data value in the column that does not represent a date.
Figure 1 – SQL Server data types
Data types impact SQL performance
Data type choice alone can easily mean a 20 times difference in SQL performance. Whatever the impact number is not the point, the point is it is SIGNIFICANT and unnecessary hit.
Take a peak at SQL Server Performance Tuning Case Studies we have here. Some were related to data types.
Impact of bad data type choice inside clustered index
Quick reminder. Clustering index (CI) is something you should have on EVERY table. Especially if those tables are important.
Data type choice for fields that go into CI is very important. The more bytes you can shave off from clustering index (CI) the better. Shaving off bytes applies to *every* column. CI is where this is most important.
Why is SQL data type choice is very import for clustering indexes?
To create index (IX), SQL takes CI cost in bytes, adds IX cost in bytes and that’s the cost to carry new IX in bytes going forward.
Example. Say your column is now NCHAR(20). That’s (20*2bytes)+2bytes = 42bytes. 42 bytes is what it takes to have the column available.
Now you want to add an index for a column that has INT data type, so that’s 4 bytes. So, IX cost will be 42+4=46 bytes. There are maybe a few bytes added on top, but that’s outside of this discussion.
Now compare that to if you took an INT column, turned that into CI, and added IX you wanted. The cost would be 4+4=8. That may not seem like much, but times 50 million rows, that comes out as 2,193MB vs 381MB. Or almost 6 times more data storage needed.
Simply by choosing correct data type, you are saving 6 times the space, or 600%. And that doesn’t end here.
Say you have 300 important tables with 7 indexes each. How much does that add if this problem was only once in each table? That’s 800,100MB, or close to 1TB. This is an extra terabyte that you have to now carry with you everywhere. That mean that data needs to be maintained, re-index, backed up, read into memory and RAM is being wasted, AlwaysOn is now slower, your replication to DR is slower and takes more space, log shipping get slower, etc. And I keep on going! This is not it!
Can it get worse?
Sure! Say we index the table by adding another 7 indexes. So the problem is 7x bigger. It gets worse.
When SQL is reading data (say we scanning a table, which SQL does a do a lot of), it needs to read in 2,200MB into RAM memory. So that’s 2,200MB worth of reads from disk (that’s bad already). That means 2,200MB worth of data will get kicked out from RAM memory. That 2,200MB we just kicked out, could have been data that we could have re-used and never hit disk for it again, but now we can’t since we just kicked it out.
It continues to get worse. SQL doesn’t go faster than it can write to LDF (log file). Now SQL has to write out 6x more. If you do an update/insert/delete, then *all* IX’es have to be updated, so that’s 6x more updates times number of indexes.
Add sending this data around across network to the client, SQL backups, log shipping, replication, AlwaysOn, maintenance, etc. on top of all this and you see how this snowballs into a big pile of slowness and wasted dollars.
I probably made some math mistakes below, but don’t lose high level picture and get bogged down in details.
In SQL Server data type choices matter! Chose data types wisely or you going to drown in SQL Server Performance problems and waste lots of dollars!