Sunday, 15 September 2013

A tale of survival - FreeNAS and ZFS - and four disks with failed sectors

Recently, one of the FreeNAS storage devices we have at the office started to generate failed sectors on two of the disks. While an eyebrow raising event in and of itself, I wasn't particularly concerned. Living in the virtual outback as we do, I ordered some more disks. About 8 days later, the third of the four disks in our NAS started to throw out errors! Uh oh, it appears that we were on a slippery slope towards Doom!

I called the supplier demanding my disks only to find out they'd ordered WD Green drives! Noooo. I amended the order to get WD Red drives (which are designed for a NAS) and was informed it would take a day or two. The next morning the final disk was generating errors. We were getting close to some serious error thresholds on two of the disks and the third and fourth were well on the way...

Impatiently waiting for the new disks to arrive, I kept a close eye on the NAS. FreeNAS emailed me with alarming frequency about disk failure, imminent apocalypse and the like. The next day the WD Red drives arrived and three of the four disks were now generating large numbers of errors. I shut the NAS down (not taking the pool offline like I should have!) and replaced the most error prone disk. Restarting the NAS I added it back to the pool, replacing one of the dead disks and let it rebuild. Gradually I replaced all the disks until the pool was degraded with a corrupted file. 

On this filesystem are all my virtual machines, so I was a bit concerned about which file was corrupt. Thankfully it was an old backup of my current Windows 7 workstation so I deleted it. Oddly, I was unable to remove two of the old disks - every time I tried it would add them back.

After a bit of head scratching I realised I needed to delete the snapshots and once I did that, I was able to remove the disks from the pool and it changed from Degraded to Online and services were all restored. I've checked over the disks and every single one has failed since. One file lost out of almost 3TB of data - thank you ZFS and FreeNAS! Note to self - sort the backups out!

No comments:

Post a comment