Sunday, August 07, 2011

Thoughts on ZFS compression

Apart from the sort of features that I now take for granted in a filesystem (data integrity, easy management, extreme scalability, unlimited snapshots), ZFS also has built in compression.

I've already noted how this can be used to compress backup catalogs. One important thing here is that it's completely transparent, which isn't true of any scheme that goes around compressing the files themselves.

Recently, I've (finally) started to enable compression more widely, as a matter of course. Certainly on new systems there's no excuse, at the default level of compression at any rate.

There was a caveat there: at the default compression level. The point here being that the default level of compression can get you decent gains and is essentially free: you gain space and reduce I/O for a negligible CPU cost. The more aggressive compression schemes can compress your data more, but having tried them it's clear that there's a significant performance hit: in some cases when I tried it the machine can freeze completely for a few seconds, which is clearly noticeable to users. Newer more powerful machines shouldn't have that problem, and there have been improvements in Solaris as well that keep the rest of the system more responsive. I still feel, though, that enabling more aggressive compression than the default is something that should only be done selectively when you've actually compared the costs and benefits.

So, I'm enabling compression on every filesystem containing regular data from now on.

The exception, still, is large image filesystems. Images in TIFF and JPEG format are already compressed so the benefit is pretty negligible. And the old thumpers we still use extensively have relatively little CPU power (both compared to more modern systems, and for the amount of data and I/O these systems do). Compression here is enabled more selectively.

Given the continuing growth in cpu power - even our entry-level systems are 24-way now - I'm expecting it won't be long before we get to the point where enabling more aggressive compression all the time is going to be a no-brainer.

6 comments:

milek said...

I have a very similar experience in this regard.

Christer Solskogen said...

But how about dedup? Do you enable both?

Peter Tribble said...

Dedup is another issue entirely. I regard ZFS dedup as essentially unusable except for very highly specialized niche tasks. For normal use it has essentially no benefit for us, and comes at a horrific cost.

Christer Solskogen said...

okay? What cost? It seems to me (and I'm loooong away from some ZFS expert) that it does almost the same. Do you know where I can read more about the pros and cons of dedup vs. compression?

triplettravel said...

Dedup has considerable memory implications so makes sense on a dedicated storage system with plenty of RAM and L2ARC. Where there are big gains it can be worthwhile but it is not free.

Craig Morgan said...

FYI: we also had debated the established merits of compression and its widespread usage, especially as you call out in regard to LZJB, hence the decision was made to enable LZJB by default in NexentaStor v3.1 for all new deployments.

For existing (upgrades), we leave the setting as is, but in new deployments we consider it a positive worth exploiting for most of our customer base.