[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090928212838.GS12922@hexapodia.org>
Date: Mon, 28 Sep 2009 14:28:38 -0700
From: Andy Isaacson <adi@...apodia.org>
To: Theodore Tso <tytso@....edu>, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: hard lockup, followed by ext4_lookup: deleted inode referenced: 524788
On Mon, Sep 28, 2009 at 04:25:07PM -0400, Theodore Tso wrote:
> On Mon, Sep 28, 2009 at 12:16:44PM -0700, Andy Isaacson wrote:
> > After a hard lockup and reboot, my test box (running recent Linus git
> > 851b147) came up with:
> >
> > [ 5.016854] EXT4-fs (sda1): mounted filesystem with ordered data mode
> > [ 8.809125] EXT4-fs (sda1): internal journal on sda1:8
> > [ 10.165239] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 524788
> > [ 10.165286] Aborting journal on device sda1:8.
> > [ 10.168111] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
> > [ 10.168169] EXT4-fs (sda1): Remounting filesystem read-only
> > [ 10.171614] EXT4-fs (sda1): Remounting filesystem read-only
>
> It would be useful to see what pathname is associated with inode 524788.
debugfs: ncheck 524788
Inode Pathname
524788 /etc/rcS.d/S90mountdebugfs
debugfs: cd /etc/rcS.d
debugfs: ls
532482 (12) . 16386 (12) .. 526699 (16) README
524349 (28) S06keyboard-setup 524350 (24) S13pcmciautils
524351 (20) S25brltty 524788 (52) S90mountdebugfs
524354 (36) S37apparmor 524356 (24) S49console-setup
524357 (20) S55urandom 524358 (28) S70screen-cleanup
524359 (24) S70x11-common 524360 (3800) S75policykit
debugfs: stat S90mountdebugfs
Inode: 524788 Type: regular Mode: 0644 Flags: 0x80000
Generation: 1027808588 Version: 0x00000000:00000001
User: 0 Group: 0 Size: 0
File ACL: 0 Directory ACL: 0
Links: 0 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4ab94c5d:3dd78a78 -- Tue Sep 22 15:14:53 2009
atime: 0x4ab949a2:58503f24 -- Tue Sep 22 15:03:14 2009
mtime: 0x4ab94c5d:3dd78a78 -- Tue Sep 22 15:14:53 2009
crtime: 0x4ab7bf7b:2589fbd4 -- Mon Sep 21 11:01:31 2009
dtime: 0x4ab94c5d -- Tue Sep 22 15:14:53 2009
Size of extra inode fields: 28
EXTENTS:
(END)
> > 2. after a lockup the journal recovery should not fail.
>
> I'm not sure it was a matter of the journal recovery failing. All we
> know for certain is that filesystem was corrupted after the lockup and
> remounting the filesystem. What caused the file system corruption is
> open to question at the moment; it could have been caused by the
> lockup; or it could have been a file that was deleted right about the
> time of the lockup; or it could have been some completely random
> filesystem corruption that.
>
> It would be useful to know whether the inode was in question was
> supposed to have been deleted. If it was, it would be useful to know
> if the dtime reported by debugfs's stat was around the time of the
> original lockup.
/etc/init.d/mountdebugfs is also present in the directory but has dtime
set. mountdebugfs is part of blktrace, which I installed ("apt-get
install blktrace") at least 20 minutes prior to the lockup; I'd
run blktrace, gotten incomprehensible errors, figured out I needed
CONFIG_FTRACE and the relevant block IO tracing options to use blktrace,
reconfigured the kernel, and was rebuilding when it hung.
Hmmm, /var/log/dpkg.log has mtime Sep 23 18:45:48 and is missing today's
activity (blktrace and ncurses5-dev).
/var/cache/apt/archive/blktrace_1.0.1-2_amd64.deb also has dtime set.
fsck.ext4 has a lot to say -- looks like something bad happened Sep 22
and there's been corruption lurking since. I used the machine a fair
bit on 9/23 though (including installing some packages), and didn't see
any problems.
I've attached the complete output from "fsck -n /dev/sda1" and "stat
<%d>" on each inode reported to be deleted.
-andy
View attachment "fsck.out" of type "text/plain" (75660 bytes)
View attachment "debugfs-deleted-inodes.out" of type "text/plain" (63999 bytes)
Powered by blists - more mailing lists