Why data type choices matter?

I’ll use this opportunity to remind you of something.

Choosing the right data types for variables, your tables, and stored procedures not only improves performance by ensuring a correct execution plan, but it also improves data integrity by ensuring that the correct data is stored within a database.

If you use a DateTime data type for a column of dates, for example, then only valid dates will be stored in this column. However, if you use a character or numeric data type for the column, then eventually, someone will be able to store any type of character or numeric data value in the column that does not represent a date.

Figure 1 – SQL Server Data types

Should this column be GUID or int?

That alone can easily mean a 20x  difference in performance. Whatever the number is not the point, the point is it is SIGNIFICANT.

The more bytes you can shave off from clustering index (CI) – the better. Shaving off bytes applies to *every* column. Just CI is where this is most important.

Why is that important?

To create index (IX), SQL takes CI byte cost, adds IXs cost and that’s the cost to carry new IX in bytes.

For example. Say your column is NCHAR(20). That’s (20*2bytes)+2bytes. So 42bytes.

Now you want to add an index for a column that has INT data type, so that’s 4 bytes. So, IX cost will be 42+4=46 bytes. There are maybe a few bytes added on top, but that’s outside of this discussion.

Now compare that to if you took an int column, turned that into CI, and added that IX you wanted. The cost would be 4+4=8. That may not seem like much, but times 50MM rows, that comes out as 2193MB vs 381MB. Or almost 6 times more data storage needed.

Say you have 300 important tables with 7 indexes each. How much does that add if this problem was only once in each table? That’s 800,100MB, or close to 1TB. An extra terabyte that you have to maintain, re-index, backup, read into memory, always on it, replicate, DR it, log ship, etc!

Can it get worse?

Sure!  Say we index the table by adding another 7 IXes. So the problem is not 7x bigger.

And gets worse.

When SQL is reading data (say we scanning a table, which SQL does a do a lot of), it needs to read in 2200MB into memory. So that’s 2200MB worth of reads from disk (that’s bad). That means 2200MB worth of data will get kicked out from memory. That 2200MB we just kicked out, could have been data that we could re-use and never hit disk for it again, but now we can’t since we just kicked it out.

It continues to get worse. SQL doesn’t go faster than it can write to LDF (log file). So now SQL has to write out 6x more. If you do an update/insert/delete, then *all* IX’es have to be updated, so that’s 6x more updates times number of indexes.

Add sending this data around across network to the client, SQL backups, log shipping, replication, AlwaysOn, maintenance, etc on top of all this and you see how this snowballs into a big pile of slowness.


I probably made some math mistakes here, but don’t get bogged down in details.

The point is still the same – data types matter!

So chose wisely when you design a new database, table or modify a column!

Agree? Disagree? Comment below!

Mark Varnas

Mark Varnas

I love making performance tuning SQL Servers fast and making them more stable. And I channel that obsession into our SQL Managed Services and new content here. When I'm not writing about SQL, I spend time outside hiking, skiing, mountain biking, or trying a new recipe.

Leave a Reply

Your email address will not be published. Required fields are marked *