[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121023232813.GF28626@thunk.org>
Date: Tue, 23 Oct 2012 19:28:13 -0400
From: Theodore Ts'o <tytso@....edu>
To: Nix <nix@...eri.org.uk>
Cc: linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
"J. Bruce Fields" <bfields@...ldses.org>,
Bryan Schumaker <bjschuma@...app.com>,
Peng Tao <bergwolf@...il.com>, Trond.Myklebust@...app.com,
gregkh@...uxfoundation.org,
Toralf Förster <toralf.foerster@....de>,
Eric Sandeen <sandeen@...hat.com>, stable@...r.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3
(and other stable branches?)
On Wed, Oct 24, 2012 at 12:06:21AM +0100, Nix wrote:
> I note that the patch is in the latest stable releases of 3.4.x and
> 3.5.x, which are IIRC end-of-lifed. I'd say that if your patch does
> indeed fix it, this justifies pushing out new releases of both these
> stable kernels with just this patch in, just to make sure people taking
> the latest stable kernel from those releases don't eat their
> filesystems.
Eric is in the process of reviewing the bug, and creating a repro case
so we can definitely show that my theory is sound, and that the bug
has been fixed by my proposed fix. We know that my patch definitely
restores the behaviour previous to commit eeecef0af5, so it can't
hurt, but we do want to make 100% sure that it really fixes the
problem. (I found the potential bug by desk checking the all of the
commits between v3.6.1 and v3.6.3, and none of the other commits
triggered my WTF alarm, but we want to have a easy repro case so we
can be 100% sure it's been fixed. It's always nice when theory is
backed up with empircal evidence. :-)
Until then, it should also be fine to just revert that commit on the
other stable kernels.
> The bug did really quite a lot of damage to my /home fs in only a few
> minutes of uptime, given how few files I wrote to it. What it could have
> done to a more conventional distro install with everything including
> /home on one filesystem, I shudder to think.
Well, the problem won't show up if the journal has wrapped. So it
will only show up if the system has been rebooted twice in fairly
quick succession. A full conventional distro install probably
wouldn't have triggered a bug... although someone who habitually
reboots their laptop instead of using suspend/resume or hiberbate, or
someone who is trying to bisect the kernel looking for some other bug
could easily trip over this --- which I guess is how you got hit by
it.
Regards,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists