[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49A5D74D.9030309@cox.net>
Date: Wed, 25 Feb 2009 17:42:05 -0600
From: Ron Johnson <ron.l.johnson@....net>
To: Linux-Ext4 <linux-ext4@...r.kernel.org>
Subject: Re: EXT4-fs: group descriptors corrupted!
On 02/25/2009 05:18 PM, Theodore Tso wrote:
> Huh. OK, there's something really strange going on here.
>
> The kernel never updates the backup superblock; that's by design, to
> avoid corruption problems. So for example, on my laptop, if I run
> dumpe2fs on my root partition, I see this:
>
> Filesystem created: Fri Feb 13 09:00:02 2009
> Last mount time: Tue Feb 24 14:34:19 2009
> Last write time: Tue Feb 24 14:34:19 2009
> Mount count: 3
> Maximum mount count: 30
> Last checked: Sat Feb 14 10:46:41 2009
> Check interval: 15552000 (6 months)
> Next check after: Thu Aug 13 11:46:41 2009
>
> However, if I run dumpe2fs -o superblock=32768 on my root partition,
> I'll see this:
>
> Filesystem created: Fri Feb 13 09:00:02 2009
> Last mount time: Fri Feb 13 11:22:06 2009
> Last write time: Sat Feb 14 10:47:11 2009
> Mount count: 0
> Maximum mount count: 30
> Last checked: Sat Feb 14 10:46:41 2009
> Check interval: 15552000 (6 months)
> Next check after: Thu Aug 13 11:46:41 2009
>
> Note the difference in the "last write time" and the "last mount
> time". That's because normally we avoid touching the backup
> superblocks.
>
> Now let's take a look at your dumpe2fs output. In your case, we see
> the following:
>
> Filesystem created: Thu Jan 22 19:33:20 2009
> Last mount time: Fri Jan 23 16:23:58 2009
> Last write time: Sun Feb 22 02:31:02 2009
> Mount count: 1
> Maximum mount count: 24
> Last checked: Fri Jan 23 16:19:49 2009
> Check interval: 15552000 (6 months)
> Next check after: Wed Jul 22 17:19:49 2009
>
> and it's the same on both the primary and backup (dumpe2fs -o
> superblock=32768). The question is how the heck did *that* happen?
> As I mentioned, the kernel doesn't even have code to touch the backup
> superblock. That would tend to implicate one of the e2fsprogs tools,
> or sometihng using the e2fsprogs libraries --- but the recent
> libraries (and you're using e2fsprogs 1.41.x) also avoid touching the
> backup superblocks. The only tools that could have done it from
> e2fsprogs userland are e2fsck, tune2fs, and resize2fs, and that
> doesn't explain how the values turned out to be pure garbage.
>
> Does that the "last write" timestamp suggest anything to you? What
> was happening on the system at or around Sun Feb 22 02:31:02 2009?
> Maybe if we can localize this down to what userspace program caused
> the problem, it'll be a hint.
That's about 10 hours before I rebooted the machine, middle of a
Saturday night...
I performed a rather large apt-get upgrade at around 01:30, but that
would have only touched /, not my "big data" directory.
~/Documents is symlinked into /data/big/Documents, so I might have
been editing an OOo document, or copying a YouTube file to it, but
nothing pops into mind.
> (This is why I didn't want you to run e2fsck just yet; if you had, it
> would have overwritten the last write time, which could be a value
> clue as to what is causing this problem.)
>
> As far as how to recover your data, what I would recommend doing is
> creating a writeable LVM snapshot, with a pretty good amount of space.
Sorry, but I don't have *any* unallocated space left.
> Then try running the command "mke2fs -S " on the snapshot, with
> *precisely* the same mke2fs arguments and /etc/mke2fs.conf that you
> used to create the filesystem in the first place. Then cross your
> fingers, and e2fsck on the snapshot, and see how much of the data you
> can recover; some of it may end up in lost+found, but hopefully you'll
> get most of the data back. If it works on snapshot, only then try it
> on the real LVM. If it doesn't work out on the snapshot, you can
> always discard it and try again without further corrupting any of your
> original filesystem.
>
> Good luck, and thanks in advance for anything information you can give
> us to help track down this problem. And this point I'm going to guess
> that it's a nasty e2fsprogs bug, where somehow the internal in-memory
I'm sure that I didn't run any "e2" app on a mounted device!
> version of the block group descriptors got corrupted, and then gotten
> writen out to disk. But this is just a guess at this point --- and
> I'm still left wondering why I haven't seen it on my systems and on my
> regression testing.
Note that this only happened on a reboot. I had mounted & unmounted
this device many times while learning about lvm2, adding files,
resizing-expanding the fs, adding more files, etc. But that only
took two days, and then it "sat" there for almost 4 weeks with no
problems.
--
Ron Johnson, Jr.
Jefferson LA USA
The feeling of disgust at seeing a human female in a Relationship
with a chimp male is Homininphobia, and you should be ashamed of
yourself.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists