Compressing Backup Catalogs
Backup solutions such as NetBackup keep a catalog so that you can find a file that you need to get back. Clearly, with a fair amount of data being backed up, the catalog can become very large. I'm just sizing a replacement backup system, and catalog size (and the storage for it) is a significant part of the problem.
One way that NetBackup deals with this is to compress old catalogs. On my Solaris server it seems to use the old compress command, which gives you about 3-fold compression by the looks of it: 3.7G goes to 1.2G, for example.
However, there's a problem: in order to read an old catalog (in order to find something from an old backup) it has to be uncompressed. There's quite a delay while this happens, and even worse, you need disk space to handle the uncompressed catalog.
Playing about the other day, I wondered about using filesystem compression rather than application compression, with ZFS in mind. So, for that 3.7G sample:
Even with the ZFS default, we're doing almost as well. With gzip, we do much better. (And it's odd that gzip-9 does worse than gzip-1.)
However, even though the default level of compression doesn't compress the data quite as well as the application does, it's still much better to use ZFS to do the compression, as then you can compress all the data: if you leave it to the application then you always leave the recent data uncompressed for easy access, and only compress the old stuff. So assume a catalog twice the size above, and that we used NetBackup to compress half the catalog, then the disk used in the application case would be 3.7G uncompressed and 1.2G compressed. The total disk usage comes out as:
The conclusion is pretty clear: forget about getting NetBackup to compress its catalog, and get ZFS (or any other compressing filesystem) to do the job instead.