I feel obliged to point out that this blog post is roughly 13 years old. People change, opinions evolve. In just a few years, vast technological landscapes can shift. And don't get me started on config files. Please consider this text in the context of its time.

No, ZFS Really Doesn’t Need a fsck — Four Years Later

Friday was a day that I called a 10k day. More than 10,000 visitors to my blog in one day. Saturday was similar. This surge was created by a link on news.ycombinator.com to an article I wrote roughly four years ago about ZFS: No, ZFS really doesn’t need a fsck.

Just wanted to express that four years later and a lot more experience with ZFS later, 12 years after ZFS saw the light of the world, I’m more of the opinion that ZFS doesn’t need a fsck than ever.

I think the key to understanding ZFS is to understand that one of the key principles leading development was the concept of not trusting the rotating rust: checksumming everything, doing copy-on-write, having redundant metadata even on a single disk, and a lot of other small and large ideas. That said, my experience is that ZFS survives an unbelievable amount of abuse before even getting into the state that you need the recovery import feature. Thus many problems that lead to the need for an fsck tool in some other filesystem are just non-existent on ZFS. For example the idea of being always consistent: the filesystem has either the old state or the new state, but never something in between. That helps to counter a lot of problems.

When I discuss with people about the missing fsck, most often they quickly believe that we are protecting the data to counter outside effects like power-offs, unreliability of rotating rust and so on. But the next argument is often “There could be bugs in ZFS. ZFS needs fsck to repair such problems.” And, yup — of course there are bugs in it. Because there must be bugs in it, as there is no such thing as a bug-free piece of code, at least when it’s significantly longer than print "hello world" (that said, from programming classes I gave in the past, I know that people can even put bugs in such a short piece of code). And no, ZFS still doesn’t need a fsck tool. In order to solve a bug, I would consider a fsck even counterproductive. My reasoning against fsck is pretty simple:

Why should fsck address a bug that is obviously unknown before, because otherwise it would have been fixed?

When there is a bug in the code that writes or reads the on-disk state, why should the bug be addressed by the fsck code in order to do a successful repair that is more than just forcing the filesystem into a mountable structure? This would assume that you know of the bug beforehand, but then you could better fix the bug in the code that writes or reads it.

Why should bugs or problems only be addressed at fsck time?

When there is a bug in the on-disk state, it should be addressed by the code that reads the data and should be repaired by correcting it on the fly. This shouldn’t be done by a fsck tool that you just have started when something has gone bad and you have rebooted the system. Or just every thirty reboots.

Why should I repair bugs in a generic manner?

The correction of a bug in the on-disk state should be done on the basis of exact knowledge about the bug, by a piece of code tailored to solve the bug and not by a generic check tool that forces the on-disk structure into a structure required by the filesystem.

How do I know without analysis if the bug is in the read or write code?

The next interesting question in this case: is the bug in the read part or in the write part of the code? If it’s in the read part, you would perhaps correct perfectly correct data. The question is: is it correct after the repair? Or still available?

How do I know if the attempt to repair is correct?

Repair is always based on assumptions. Those assumptions can be correct or incorrect. Thus a repair can be correct or incorrect. The more you know about the problem that led to the repair-worthy state, the more probable the assumptions are correct. Especially when you have to trust the repair for delivering a correct result.

Can I trust the repaired filesystem?

When a ZFS filesystem is that defunct that neither the integrated check mechanisms and redundancies nor the transaction group rollback can revive it, I would question the integrity of the data altogether and go to my tapes — not trying to force it into function. Especially as the problem has obviously broken a lot of protection layers that would have sent many other filesystems to the tapes a significant time before that moment. Because at the end, the mountability of a filesystem doesn’t matter. All that matters is the correctness of my data. The problem: how can you guarantee correctness of the data in a filesystem after a repair? Especially with filesystems that can’t even guarantee correctness of data when everything is working fine.

Doing Things Differently

The recovery of an unmountable filesystem works differently in ZFS. It doesn’t need to force a filesystem into a readable state. There are up to 127 readable states on disk, thus if you want to say it this way, there are up to 127 mountable filesystems on your dataset. It sounds more sensible to fall back to the last known correct and consistent state of metadata and data, based on the on-disk state represented by the pointer structure of the uberblock with the highest transaction group commit number with a correct checksum. I don’t have to repair after a crash; I just take the last intact state. The Transaction Group rollback at mount does exactly this.

I think that is one of the basic points behind the discussion: many people wanting a fsck don’t think about the implications of the COW-ness of ZFS. This COW-ness is key to many mechanisms that allow ZFS to do things differently. They make tricks like PSARC 2009/479 or this ZFS forensics scrollback script possible.

Conclusion

That said, when you really, really think that you should scratch the data from the disk, you can always use zdb. To end this article: in this discussion, there is just one argument I would accept in favour of fsck — ZFS is a blatant layering violation, and what people call fsck is just part of the normal read and write work the filesystem is doing.

Written by

Joerg Moellenkamp

Grey-haired, sometimes grey-bearded Windows dismissing Unix guy.