linux-kernel - Re: bug in xfs: can't recovery metadata log

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110607103252.GA15140@infradead.org>
Date:	Tue, 7 Jun 2011 06:32:52 -0400
From:	Christoph Hellwig <hch@...radead.org>
To:	Drunkard Zhang <gongfan193@...il.com>
Cc:	Alex Elder <aelder@....com>, xfs-masters@....sgi.com,
	xfs@....sgi.com, linux-kernel@...r.kernel.org
Subject: Re: bug in xfs: can't recovery metadata log

On Tue, Jun 07, 2011 at 01:20:23PM +0800, Drunkard Zhang wrote:
> The log recovery failure happened after a hard reboot, I did "mount
> /dev/lg/log /mnt/temp/" twice, but the similar dmesg error.
> 
> The xfs lives on LVM, with 4x2TB SATA II disk.
> 
> The first time:
> [ 1479.130446] XFS mounting filesystem dm-0
> [ 1479.226525] Starting XFS recovery on filesystem: dm-0 (logdev: internal)
> [ 1506.217842] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000000f8

[...]

> [ 1506.220989] RIP: 0010:[<ffffffff81235f9c>]  [<ffffffff81235f9c>]
> xfs_cmn_err+0x6b/0x92

[...]

> [ 1506.226301]  [<ffffffff8122922b>] ? kmem_zone_zalloc+0x1f/0x30
> [ 1506.226549]  [<ffffffff812098b5>] xfs_error_report+0x39/0x40
> [ 1506.226805]  [<ffffffff811e8340>] ? xfs_free_extent+0x8e/0xae
> [ 1506.227056]  [<ffffffff811e75cf>] xfs_free_ag_extent+0x3e7/0x70b
> [ 1506.227306]  [<ffffffff811e8340>] xfs_free_extent+0x8e/0xae

It looks like you hit one of the XFS_WANT_CORRUPTED_GOTO checks in
xfs_error_report, and we hit something in there that isn't initialized
that early during the mount process.  My guess it's actually the
mp->m_fsname dereference in xfs_fs_vcmn_err.  It's fixed by the message
rework in 2.6.39+, but that will only prevent the crash, you'll still
get an error and the log recovery will be aborted.  If you can get a
more recent kernel on the box I'd be curious what the output form it is.

Did you run older kernels on this machine before?  Before 2.6.33 device
mapper support for barriers (aka cache flushes) was incomplete and
frequently led to free space corruption if people left the volatile
write caches on.  For MD underneath it event took a bit longer.

If you just want to continue using the filesystem you can nuke the
log using xfs_repair -L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/