Saturday, January 23, 2010

Compressing Backup Catalogs

Backup solutions such as NetBackup keep a catalog so that you can find a file that you need to get back. Clearly, with a fair amount of data being backed up, the catalog can become very large. I'm just sizing a replacement backup system, and catalog size (and the storage for it) is a significant part of the problem.

One way that NetBackup deals with this is to compress old catalogs. On my Solaris server it seems to use the old compress command, which gives you about 3-fold compression by the looks of it: 3.7G goes to 1.2G, for example.

However, there's a problem: in order to read an old catalog (in order to find something from an old backup) it has to be uncompressed. There's quite a delay while this happens, and even worse, you need disk space to handle the uncompressed catalog.

Playing about the other day, I wondered about using filesystem compression rather than application compression, with ZFS in mind. So, for that 3.7G sample:

ZFS default1.4G
ZFS gzip-1920M
ZFS gzip-9939M

Even with the ZFS default, we're doing almost as well. With gzip, we do much better. (And it's odd that gzip-9 does worse than gzip-1.)

However, even though the default level of compression doesn't compress the data quite as well as the application does, it's still much better to use ZFS to do the compression, as then you can compress all the data: if you leave it to the application then you always leave the recent data uncompressed for easy access, and only compress the old stuff. So assume a catalog twice the size above, and that we used NetBackup to compress half the catalog, then the disk used in the application case would be 3.7G uncompressed and 1.2G compressed. The total disk usage comes out as:

ZFS default2.8G
ZFS gzip-11.8G
ZFS gzip-91.8G

The conclusion is pretty clear: forget about getting NetBackup to compress its catalog, and get ZFS (or any other compressing filesystem) to do the job instead.


delewis said...

Very cool idea. My backup background is with Tivoli Storage Manager, which uses a catalog/database for the same reasons. Instead of compressing, when TSM reaches the 100-150GB range, IBM tells you to setup a new TSM instance. Just curious -- how large can Netbackup catalogs get?

Peter Tribble said...

Hey Derek - my current catalog, mostly compressed, is well over 100G on disk. On just one server. The replacement will consolidate some other backup systems, and add much data that's not backed up. I'm looking at 1TB uncompressed as a starting point, allowing for a little growth; and I regard our environment as small.

tonisoler said...

I did the same with netbackup on Solaris when we updated from Version 5.x to the latest 6.5.x about one year ago. We have a catalog of about 60Gb and the compression ratio with ZFS default compression is 1.98. This works really great and I think netbackup do not have limitations in catalog size if you run the system in 64bit environment. By the way, Solaris and netbackup works really great together.