[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140410050428.GV10985@gradx.cs.jhu.edu>
Date: Thu, 10 Apr 2014 01:04:28 -0400
From: Nathaniel W Filardo <nwf@...jhu.edu>
To: Theodore Tso <tytso@...gle.com>
Cc: Mike Rubin <mrubin@...gle.com>, Frank Mayhar <fmayhar@...gle.com>,
admins@....jhu.edu, linux-ext4@...r.kernel.org
Subject: Re: ext4 metadata corruption bug?
On Wed, Apr 09, 2014 at 10:55:48PM -0400, Theodore Tso wrote:
> Hi Nathaniel,
>
> In general, it's best if you send these sorts of requests for help to the
> linux-ext4@...r.kernel.org mailing list.
Added to CC.
> The fact that we see the "error count" line early in the boot message
> suggests to me that your VM is not running fsck to fix up the errors before
> mounting the file system. (Well, either that or you're using a really
> ancient version of e2fsck, but given that you're using a bleeding edge
> kernel, but I'm guessing you're using a reasonably recent version of
> e2fsck. But that would be good for you to check.)
e2fsck version is 1.42.9 using the same library version.
> The ext4 error message is due to the file system getting corrupted. How
> the file system got corrupted isn't 100% clear, but one potential cause is
> how the disk is configured with qemu.
>[snip]
We use QEMU directives like
-drive format=raw,file=rbd:rbdafs-mirror/mirror-0,id=drive5,if=none,cache=writeback \
-device driver=ide-hd,drive=drive5,discard_granularity=512,bus=ahci0.3
We've never had, so far as I know, an unexpected shutdown of the QEMU
process, so I don't think that unexpected loss of cache contents is to
blame.
Perhaps the dmesg I sent was not representative; some days ago, we saw, only
(comparatively!) late in the machine's uptime:
[309894.428685] EXT4-fs (sdd): pa ffff88000d9f9440: logic 832, phys. 957458972, len 192
[309894.430023] EXT4-fs error (device sdd): ext4_mb_release_inode_pa:3729: group 29219, free 192, pa_free 191
[309894.431822] Aborting journal on device sdd-8.
[309894.442913] EXT4-fs (sdd): Remounting filesystem read-only
with Debian kernel 3.13.5-1; sdd here is the same filesystem as in the
earlier dmesg.
I'll capture any subsequent crashes and follow up.
Thanks much!
--nwf;
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists