[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87f94c370902251541h35aa3ccj69a62c7c1e81f7e6@mail.gmail.com>
Date: Wed, 25 Feb 2009 18:41:42 -0500
From: Greg Freemyer <greg.freemyer@...il.com>
To: Theodore Tso <tytso@....edu>
Cc: Ron Johnson <ron.l.johnson@....net>,
Linux-Ext4 <linux-ext4@...r.kernel.org>,
Ric Wheeler <rwheeler@...hat.com>
Subject: Re: EXT4-fs: group descriptors corrupted!
Smart ass comment about the new ATA spec intentionally top-posted.
Question: How do you know those sectors did not somehow get
discarded, then modified behind the scenes by a SSD, then fixated to
new deterministic values by a read.
Answer: Because devices that do that aren't shipping yet.
Damn the future looks good from here.
On Wed, Feb 25, 2009 at 6:18 PM, Theodore Tso <tytso@....edu> wrote:
> Huh. OK, there's something really strange going on here.
>
> The kernel never updates the backup superblock; that's by design, to
> avoid corruption problems. So for example, on my laptop, if I run
> dumpe2fs on my root partition, I see this:
>
> Filesystem created: Fri Feb 13 09:00:02 2009
> Last mount time: Tue Feb 24 14:34:19 2009
> Last write time: Tue Feb 24 14:34:19 2009
> Mount count: 3
> Maximum mount count: 30
> Last checked: Sat Feb 14 10:46:41 2009
> Check interval: 15552000 (6 months)
> Next check after: Thu Aug 13 11:46:41 2009
>
> However, if I run dumpe2fs -o superblock=32768 on my root partition,
> I'll see this:
>
> Filesystem created: Fri Feb 13 09:00:02 2009
> Last mount time: Fri Feb 13 11:22:06 2009
> Last write time: Sat Feb 14 10:47:11 2009
> Mount count: 0
> Maximum mount count: 30
> Last checked: Sat Feb 14 10:46:41 2009
> Check interval: 15552000 (6 months)
> Next check after: Thu Aug 13 11:46:41 2009
>
> Note the difference in the "last write time" and the "last mount
> time". That's because normally we avoid touching the backup
> superblocks.
>
> Now let's take a look at your dumpe2fs output. In your case, we see
> the following:
>
> Filesystem created: Thu Jan 22 19:33:20 2009
> Last mount time: Fri Jan 23 16:23:58 2009
> Last write time: Sun Feb 22 02:31:02 2009
> Mount count: 1
> Maximum mount count: 24
> Last checked: Fri Jan 23 16:19:49 2009
> Check interval: 15552000 (6 months)
> Next check after: Wed Jul 22 17:19:49 2009
>
> and it's the same on both the primary and backup (dumpe2fs -o
> superblock=32768). The question is how the heck did *that* happen?
> As I mentioned, the kernel doesn't even have code to touch the backup
> superblock. That would tend to implicate one of the e2fsprogs tools,
> or sometihng using the e2fsprogs libraries --- but the recent
> libraries (and you're using e2fsprogs 1.41.x) also avoid touching the
> backup superblocks. The only tools that could have done it from
> e2fsprogs userland are e2fsck, tune2fs, and resize2fs, and that
> doesn't explain how the values turned out to be pure garbage.
>
> Does that the "last write" timestamp suggest anything to you? What
> was happening on the system at or around Sun Feb 22 02:31:02 2009?
> Maybe if we can localize this down to what userspace program caused
> the problem, it'll be a hint.
>
> (This is why I didn't want you to run e2fsck just yet; if you had, it
> would have overwritten the last write time, which could be a value
> clue as to what is causing this problem.)
>
> As far as how to recover your data, what I would recommend doing is
> creating a writeable LVM snapshot, with a pretty good amount of space.
> Then try running the command "mke2fs -S " on the snapshot, with
> *precisely* the same mke2fs arguments and /etc/mke2fs.conf that you
> used to create the filesystem in the first place. Then cross your
> fingers, and e2fsck on the snapshot, and see how much of the data you
> can recover; some of it may end up in lost+found, but hopefully you'll
> get most of the data back. If it works on snapshot, only then try it
> on the real LVM. If it doesn't work out on the snapshot, you can
> always discard it and try again without further corrupting any of your
> original filesystem.
>
> Good luck, and thanks in advance for anything information you can give
> us to help track down this problem. And this point I'm going to guess
> that it's a nasty e2fsprogs bug, where somehow the internal in-memory
> version of the block group descriptors got corrupted, and then gotten
> writen out to disk. But this is just a guess at this point --- and
> I'm still left wondering why I haven't seen it on my systems and on my
> regression testing.
>
> - Ted
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists