Wednesday, September 15, 2010

Compress large files on windows server 2008

This started out as something else. We had a large database of about 100 GB that we wanted shift to our DR location. Due to bandwidth restrictions we started out to explore options on how we could reduce the overhead of transferring such a large file.

Now compressing a 100 GB file is not a joke. After some looking up I came across an interesting article by Chris over at solo-technology talking about Compression v/s Speed. It talked about a related issue but the same capability could be used for handling large files.

The best tool for doing this is 7Zip. It is an awesome freeware by Igor Pavlov handling a large number of compression formats. Check it out over here.

That aside, by using the command line version of this you could zip you file. When we started the zip file took 5 hours to finish just 30 % of compression. On check the task manager, I found that the whole process was just running at 100 % on a single thread of the 24 cores available on the server.

By using the following options,

I was able to compress the 100 GB SQL Server 2008 database backup file to just 16 GB ! in just 1 Hour !!!

This was way beyond my expectations :) Here is that technical syntax to use this

7z a -tbzip2 -mx=9 -mmt=on backup.bak

"7z" is the program to initiate
"a" is to append the file
"-tbzip2" is to use the bzip2 format, this is important as only this format allows for multithreading for both the compression as well as the decompression
"-mx=9" is the number of passes, this is optional, I didn't use it, the system defaults this to 5
"-mmt=on" is to turn on the multithread capability, again this works only for the bzip2 format
"" is the file that will get created
"backup.bak" is the file that needs to get backed up.

You could read through the technical literature available with 7zip to get a better hang of the application. This were just the specific options that are required to manage this requirement.

One other note, I did read the the bzip2 format has a restriction that it can include only 1 file at a time. I didn't try for multiple files.

Hope this helps in saving you precious bandwidth !!!