Introduction
Suppose you need to copy an 18.2 GB table (60 million rows) from one SQL server to multiple others.
When reviewing data with T-SQL for Machine Learning exploration, you discover that one of the fields is a VARMINARY
(max). This prevents the use of backup/restore compression and slows down data transmission. In testing, it took over 11 minutes just to create the table.
Another option is to use SQL Server Integration Services (SSIS) with a Balanced Data Distributor. However, this approach requires transmitting 18.2 GB of data across multiple networks to the target servers, which can place a significant load on those networks.
The preferred solution contains the following steps:
- Use BCP (Bulk Copy Program) in parallel to extract the data to individual files.
- Compress in parallel the files.
- Transmit the compressed files to the destination servers.
- In each destination server, use BCP in parallel to load the data to individual tables.
- If needed, use partition functions/schemes or views to consolidate the data.
The process
The Powershell commands used to extract and compress the data are like the following (single line):
Start-Process PowerShell -ArgumentList {
New-Item -ItemType Directory -Path D:\Backup\Backup1\;
bcp "SELECT * FROM Database.dbo.Table WHERE Id BETWEEN 1 AND 3750000" `
queryout D:\Backup\Backup1\Backup1.out -S. -T -N -a 65535;
Add-Type -AssemblyName System.IO.Compression.FileSystem;
[IO.Compression.ZipFile]::CreateFromDirectory('D:\Backup\Backup1', 'D:\Backup\Backup1.zip');
}
When you run this, the commands will execute in parallel, allowing you to set up multiple tasks that will run at the same time, finishing in around 4 minutes.
The SQL Server instance has 20 cores, with MAXDOP set to 8.
I didn’t notice any difference in performance between running 8 threads or 20 threads; the total time was nearly the same.
However, be cautious about running just one thread—while SQL throughput jumps to 265K rows per second, the zipping process slows down to 6 minutes, bringing the total time to 11 minutes.
After compression, the sum of all zip files was only 750 MB, so transmitting that to each of the target servers takes nearly no time on a 100 GbE network without bloating it, and this can also be done offline.
Once the data is in place, it can be unzipped in parallel, which only takes a few seconds, making the time negligible. Even when done serially, it only took 99 seconds.
Now, you need to use BCP to load the data to individual tables with the PS commands like the following (single line):
Start-Process PowerShell -ArgumentList {
bcp Database.dbo.Backup1 in D:\Backup\Backup1.out `
-S. -T -N -a 65535 -h "TABLOCK" -b 10000
}
They all finished in 61 seconds, but this can’t be done to the same table due to contention; if done, it ends in about 30 minutes. The total time is 5.3 minutes from end to end.
One last thing to consider is that now you have multiple tables. If you need the data consolidated, you can create a view or copy the rows, but the fastest option is to use partition functions and schemes to logically move them into one, which also takes no time, so it doesn’t add up.
Final thoughts
There are several ways in which this process can be improved further:
- Consider partitioning the source table if this operation needs to be performed multiple times. If you choose a good partitioning scheme, you won’t have to transmit all the data every time, but only the new data or the one that changed. The partitions can be moved to individual tables before extracting the data, which also increases the performance.
- Consider using multiple drives for each output file. In my case, I had only one apart from the DB drive. Locally mounted SSD drives as destinations increase the throughput further.
- Jumbo network packets increase throughput from the storage to the DB server.
- If the table is in-memory, the data won’t need to be retrieved from storage through the network, reducing the time.
- Pre-growing the destination database before loading the data reduces the time.
- You can find additional tips on improving performance in this article from Microsoft.