linux-ext4 - Re: Filesystem corruption on Fedora 17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121127164745.GB7107@thunk.org>
Date:	Tue, 27 Nov 2012 11:47:45 -0500
From:	Theodore Ts'o <tytso@....edu>
To:	Adam Huffman <adam.huffman@...il.com>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: Filesystem corruption on Fedora 17

On Tue, Nov 27, 2012 at 01:31:18PM +0000, Adam Huffman wrote:
> 
> On two machines now I've had severe filesystem corruption.  They are
> both Fedora 17 machines, and they both have, at some point, run the
> kernels that have been mentioned recently as possibly suffering from
> ext4 corruption problems.

I don't know if you followed the story that closely, but the hysteria
over the "ext4 corruption problems" were caused by users who were
using non-standard mount options or other ext4 features....

> In the worst case, fsck is unable to fix the problems:
> 
> fsck from util-linux 2.20.1
> e2fsck 1.42.4 (12-June-2012)
> ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
> fsck.ext4: Group descriptors look bad... trying backup blocks...
> /dev/mapper/heppc128-lv_home: recovering journal
> fsck.ext4: unable to set superblock flags on /dev/mapper/heppc128-lv_home

Furthermore, this doesn't look like any of the problems that people
have reported.  The corruption pattern looks most like what you would
see if the blocks in the beginning (low numbered blocks) part of the
file system have been overwritten with garbage.

So first of all, if there is critical data that you want to preserve,
the first thing I'd suggest doing is to make a image copy of the
partition; it's only 56 GB, so hopefluly you have space to make a copy
before you do any further experimentation to try to recover things.

As far as the "unable to set superblock flags" error, I think I can
see how that can happen (and in fact I've created a short test case
which demonstrates the problem --- see attached), but that appears to
be a one shot failure.  That is, the second time you run e2fsck, it
should be able to make progress. is that the case for you?

(It's also possible that there are hardware bugs which is triggering
this problem, however, and if in fact you're seeing this happen
repeatably, I'd have seriously suspect some kind of hardware failure.)

	    	     	       	       	    - Ted

P.S.  In order to get this failure I had to basically use a block
editor, since there are software safeguards which prevent e2fsprogs or
ext4 from setting the needs_recovery bit on backup superblocks, and
this is what was necessary to trigger the bug.  I'll fix this for the
next release of e2fsprogs.  The reason why we hadn't noticed was
because (a) it basically requires a very specific hardware-induced
bit-flip to trigger, and (b) even when it does, the second run of
e2fsck makes the problem go away, so typically it gets noticed when
system fails to boot due to e2fsck blowing out, and then when the
system administrator runs fsck a second time on the file system,
forward progress gets made.

Download attachment "testcase.img.gz" of type "application/octet-stream" (37512 bytes)