Thursday, June 23, 2005

What's your disk MTBF?

A typical disk has a quoted MTBF of a million hours or so. (For SCSI; IDE may be somewhat less.) Let's call that 100 years - allowing for a mix of IDE and SCSI drives.

Now I have about 1,000 drives. Of a wide range of types and ages - IDE, SCSI, FC-AL. Some dating from the late 1990s. Based on the MTBF and the number of drives, I would expect to see 10 disk failures a year - or almost one each month.

I'm not actually seeing anything like that failure rate. It's much lower. Had a disk fail earlier in the week, but that's rare. I'm guessing that I'm seeing only a third of the expected rate of failure.

That's on my main systems anyway. Those are in a controlled environment - stable power, A/C provides constant temperature (not as cool as I would like, but stable). They spin for years without being provoked. I have a 12-disk D1000 array that sat unused in a cupboard for the best part of a year before being rescued and thrown together for beta testing ZFS, and so far 4 of the 12 disks in that have failed in the last year. Which is a rate much larger than you would expect based on the rated MTBF.

My conclusion would be - and this should be well known to everyone anyway - that if you take care of your kit and give it a nice stable environment in which to operate, it will reward you with excellent reliability. Beat it up and treat it like rubbish, and it will refuse to play nice.

No comments: