How to use SQL Server FILESTREAM feature for large databases

Do’s and don’ts

 

What is SQL Server FileStream feature?

This feature allows adding unstructured documents into the database. You can store images, PDFs, videos, or whatever files you want into SQL Server database. This allows to have transactional consistency of images.

 

We have worked with very large FILESTREAM implementation. And I will show how this feature can cause lots of problems. And you could do about them.

 

Let’s start with Environment description, so you have a bit of context:

  • Data size – 50TB+
  • Brick and mortar business.
  • Operating hours M-SAT 8AM-6PM, across all US time zones. Closed on SUN.
  • What are we storing in FILESTREAM db: documents (PDF, MS Word files), pictures (jpg, jpgs, gif, bmp, etc.)
  • What did SQL Server installation look like?
    • 4 node SQL Server fail over geo-distributed cluster
    • Storage: SAN with three levels of disk speeds
      • Fast – SSD
      • Medium – 15,000 RPM drives
      • Slow – 7,500 RPM drives
    • Another server for Disaster Recovery (DR)
    • About 15 user databases. Largest one under 2TB
    • All File Stream data resided in separate database, outside of user databases

 

When FileStream is a good solution?

  1. If data size in 3-5y will grow to less than 1TB – FileStream probably can work.
  2. If you have need to have POINT IN TIME restore of the documents or images stored.
  3. If you store small files. Under 1MB each. Check this whitepaper to confirm, there is a file size in there mentioned.

If all true – go for it.

 

Not all things you will read below can be blamed on SQL Server alone.

Some was bad database administration.

Some a business problem – business kept doubling every 9months and no one planned for this type of data explosion.

Some coding could have been better.

SAN and some networking left few things too tangled up.

Some tried the best, and screwed it up more.

The point is – this solution has sharp edges and if not careful, you will cut yourself.

 

Problems and some solutions with SQL Server FILESTREAM

  1. Backups
    1. DB backup duration. As filestream database grew, we could no longer take full database backup fast enough. Backup duration started to run longer than 24h, then soon 48h were not enough. At the end FULL backup would start at FRI 6PM and on Mon morning it would be still going.
      1. Fix. Speeding up SQL backup. Splitting the backup into 32 files helped. Also changing BUFFERCOUNT, BLOCKSIZE and MAXTRANSFERSIZE – helped Check more info here.
      2. How do you tune your backups for speed and how do you find the best numbers for your situation for backup file count, BUFFERCOUNT, BLOCKSIZE and MAXTRANSFERSIZE? Testing. We ran many. We tested very low number for one parameter (BUFFERCOUNT), and went high. Measured and recorded the data. Found an optimal parameter. Then moved on to the next parameter (BLOCKSIZE). Found optimal one. Moved on the next. You get the point. When all parameters had optimal values, we just combined them all. I can’t remember exact numbers, but it cut backup duration significantly. By 30-40% or so.
    2. Backup size. Backup size became hard to manage. Because of the way things were designed, we had to use FULL backup.
    3. Dumb backups. That’s my “technical term”. Which means each full backup included data that went 10 years back. Its dumb to keep backing up data that does not change? But to get to the place where backing up only recent data would be possible that needed some work – little more about that later.
    4. SAN storage. Since this was a cluster with shared storage, and limited SAN space, we started running into SAN capacity problems. You not only have to store the large DB. Then you need at least two FULL backups. And data doesn’t compress well. Because data being stored is all PDFs/jpgs, etc – which are already compressed. So, now we need db size times three. At least. Add differential and transactional backups. And you can see that 50TB db easily requires 150-200TB of SAN storage. Not a luxury that everyone has floating around.
    5. Lower environments. Restoring filestream database to DEV, QA, UAT environments – became impossible. Not only because time it takes is close to “forever”. But also, because lower environments often do not have that many terabytes just waiting around for you.
      1. Fix – instead of backup and restore to refresh DEV, QA and UAT – we would deploy a db built from T-SQL scripts. And then run another script that would populate 100,000 records into each table with documents. That way instead of 50TB, we needed only 1GB.
    6. SAN speed. This is not a SQL Server issue. But SAN was grossly misconfigured. The throughput numbers were terrible. At few points transfer speed was slower than USB2 speed (less than 45MB/s.) Yeah…
    7. If you still here, you may still be considering FileStream for your project. Keep reading, as by the end you may not like this feature much.
  2. Partitioning
    1. Backing up data 10y old. At some point, we realized that only data that gets modified are documents stored in last 90 days. And if we could partition the data into separate file groups (FG) we could potentially ONLY backup last 90d. And not the 10 years of data. Now we talking!
    2. Hello SQL Partitioning!
        1. We partitioned data and split File Group (FG) into each year. Set those file groups to READ_ONLY. Backed up READ_ONLY FGs once. This contained majority of the storage.
        2. Left a live partition of last 90 days.
        3. And could now move this FG onto faster SAN layer, because we no longer need insane amount of space.
          • Few partitioning minuses and pluses:
              • Minus: More complex.
              • Minus: Now need to pay double attention that READ_ONLY backups do not get corrupt. Because if they do, we may lose FULL backup.
              • Minus: Have to use SQL Enterprise edition. But that’s the version we used already. No problem there.
              • Plus: Can use SQL Server database partial availability (ignore that doc is about SQL 2005 – most of it relevant and shows how things still work today). Partial Availability is awesome. If we ever have to restore PROD from backup, we can restore PRIMARY file group (FG), plus FG with last 90 days of data and DB is now available for taking transactions. Restoring older data, can continue in the background into those READ_ONLY FGs, while PROD is humming along
            • But before we could implement partitioning, we had to UNDO someone else failed partitioning attempt.
    1. Antivirus and db corruption
      1. When we started working on this filestream db, one of the first things was to run DBCC CHECKDB. Which found about 40 corruption errors. Which could not be fixed with SQL commands.
      2. We ended up calling Microsoft Support and their suggestion was to export all data out into a new db. And throw old one out after. Umhhh…. Yeah… that would take about umhh….. like forever….. Thanks, but let’s leave as final option.
      3. I restored this db with corruption onto another server, and tested couple things. And was able to fix corruption!! HA!
      4. Basically, the issue was that FS stores images on disk. In weird directory structures. And when records were deleted in DB, files on disk did not. Which made SQL Server think – db is corrupt. All I had to do is to figure out how to convert corrupted record into a directory path (no, I dont remember how to do it now), and go on disk and delete that path. Re-ran DBCC CHECKDB, and bam! I got one less error. After doing it another 39 times, all gone!
      5. There was a strong suspicion antivirus was involved, therefore reconfigured antivirus exceptions.
    2. RAM usage
      1. Another problem with filestream was how application retrieves the data. If app runs a “SELECT” statement to get a document and file size is 5MB, all 5MB go into SQL buffer, which means RAM memory. Do that for a ton of queries, (5MB times 500 request) and you talking about a ton of RAM wasted. Not only that, each 5MB document kicks out really useful data from the RAM.
      2. Recommendation: DO NOT USE SELECT to get your image. Use Win32 APIs instead from the app. That bypasses this problem and it works faster!
    3. Some of what the data in filestream db was pure junk
      1. This is not a SQL problem. It’s a poor app dev. Turns out app did not restrict users from uploading only allowed data type, like images or PDFs. And the app did not have size limit. So, we found EXE files uploaded, 300MB files, mp3, other junk files.
    4. AlwaysOn
      1. Mid way through all changes, we had to migrate to a new data center. Few things were done not to waste the opportunity.
        1. We moved away from pure Win Failover Cluster with shared storage to using AlwaysOn.
        2. Upgraded SQL version.
        3. Used all best practices for Hardware, OS and SQL setup.
      2. For AlwaysOn, we could not put this file streamed db into the same availability group as user databases. It had to be its own AG Group. Which means that user dbs will not always be on the same server. This creates a problem. Because we had queries reaching out from user db into file stream db. Not going to get into how we solved this, but we did. Point is that – it required a workaround. Because of damn File Stream monster.
      3. AlwaysOn can use local storage. That’s cool. What’s not cool, I now have to have disk space for this large sucker on each server. 4 replicas, plus a DR site. That’s 5X the file stream db space. Plus, the backups. Plus disks are RAID’ed. Which all times, a TON of space is required.
    1. Popularity of FileStream
      1. This is not very well adopted SQL Server feature. Finding answers online – was challenging.
      2. There is only a single FileStream whitepaper.
      3. There is only a single book about FileStream.
        • Look how many reviews book has. One! And it’s a good book. But that should tell you only a very few companies ever adopted this feature!
    1. DB growth projections
      1. At some point, we decided to look at growth projections. DB was doubling in size every 9-11mo. Umhhh…. Wait…what? You saying that next year, this problem will be twice as big? Damn… I better leave this ticking bomb for someone else. Yes, that thought, crossed my mind! 😊
      2. The only way to deal with this is to re-architect.

 

Conclusion

There are very little uses cases for SQL Server FileStream feature usage. There are a lot of ways how to storing files inside SQL Server can hurt. And the only way to have unlimited growth is to NOT to store files inside the SQL Server. In our case – the only way was to re-architect.

More about file streaming re-architecture in another blog post.

 

Agree? Disagree? Leave a comment.

 

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email