linux-ext4 - Re: EXT4-fs: group descriptors corrupted!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090225231853.GG1363@mit.edu>
Date:	Wed, 25 Feb 2009 18:18:53 -0500
From:	Theodore Tso <tytso@....edu>
To:	Ron Johnson <ron.l.johnson@....net>
Cc:	Linux-Ext4 <linux-ext4@...r.kernel.org>
Subject: Re: EXT4-fs: group descriptors corrupted!

Huh.  OK, there's something really strange going on here.

The kernel never updates the backup superblock; that's by design, to
avoid corruption problems.  So for example, on my laptop, if I run
dumpe2fs on my root partition, I see this:

Filesystem created:       Fri Feb 13 09:00:02 2009
Last mount time:          Tue Feb 24 14:34:19 2009
Last write time:          Tue Feb 24 14:34:19 2009
Mount count:              3
Maximum mount count:      30
Last checked:             Sat Feb 14 10:46:41 2009
Check interval:           15552000 (6 months)
Next check after:         Thu Aug 13 11:46:41 2009

However, if I run dumpe2fs -o superblock=32768 on my root partition,
I'll see this:

Filesystem created:       Fri Feb 13 09:00:02 2009
Last mount time:          Fri Feb 13 11:22:06 2009
Last write time:          Sat Feb 14 10:47:11 2009
Mount count:              0
Maximum mount count:      30
Last checked:             Sat Feb 14 10:46:41 2009
Check interval:           15552000 (6 months)
Next check after:         Thu Aug 13 11:46:41 2009

Note the difference in the "last write time" and the "last mount
time".  That's because normally we avoid touching the backup
superblocks.

Now let's take a look at your dumpe2fs output.  In your case, we see
the following:

Filesystem created:       Thu Jan 22 19:33:20 2009
Last mount time:          Fri Jan 23 16:23:58 2009
Last write time:          Sun Feb 22 02:31:02 2009
Mount count:              1
Maximum mount count:      24
Last checked:             Fri Jan 23 16:19:49 2009
Check interval:           15552000 (6 months)
Next check after:         Wed Jul 22 17:19:49 2009

and it's the same on both the primary and backup (dumpe2fs -o
superblock=32768).  The question is how the heck did *that* happen?
As I mentioned, the kernel doesn't even have code to touch the backup
superblock.  That would tend to implicate one of the e2fsprogs tools,
or sometihng using the e2fsprogs libraries --- but the recent
libraries (and you're using e2fsprogs 1.41.x) also avoid touching the
backup superblocks.  The only tools that could have done it from
e2fsprogs userland are e2fsck, tune2fs, and resize2fs, and that
doesn't explain how the values turned out to be pure garbage.

Does that the "last write" timestamp suggest anything to you?  What
was happening on the system at or around Sun Feb 22 02:31:02 2009?
Maybe if we can localize this down to what userspace program caused
the problem, it'll be a hint.

(This is why I didn't want you to run e2fsck just yet; if you had, it
would have overwritten the last write time, which could be a value
clue as to what is causing this problem.)

As far as how to recover your data, what I would recommend doing is
creating a writeable LVM snapshot, with a pretty good amount of space.
Then try running the command "mke2fs -S " on the snapshot, with
*precisely* the same mke2fs arguments and /etc/mke2fs.conf that you
used to create the filesystem in the first place.  Then cross your
fingers, and e2fsck on the snapshot, and see how much of the data you
can recover; some of it may end up in lost+found, but hopefully you'll
get most of the data back.  If it works on snapshot, only then try it
on the real LVM.  If it doesn't work out on the snapshot, you can
always discard it and try again without further corrupting any of your
original filesystem.

Good luck, and thanks in advance for anything information you can give
us to help track down this problem.  And this point I'm going to guess
that it's a nasty e2fsprogs bug, where somehow the internal in-memory
version of the block group descriptors got corrupted, and then gotten
writen out to disk.  But this is just a guess at this point --- and
I'm still left wondering why I haven't seen it on my systems and on my
regression testing.

    	       		       	       	    - Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html