lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171017144117.ispjgvwrespix5z3@thunk.org>
Date:   Tue, 17 Oct 2017 10:41:17 -0400
From:   Theodore Ts'o <tytso@....edu>
To:     Vijay Chidambaram <vvijay03@...il.com>
Cc:     Amir Goldstein <amir73il@...il.com>,
        Ashlie Martinez <ashmrtn@...xas.edu>,
        Eryu Guan <eguan@...hat.com>,
        Ext4 <linux-ext4@...r.kernel.org>, Josef Bacik <jbacik@...com>,
        Xiao Yang <yangx.jy@...fujitsu.com>,
        fstests <fstests@...r.kernel.org>
Subject: Re: [PATCH] ext4: fix interaction between i_size, fallocate, and
 delalloc after a crash

On Tue, Oct 17, 2017 at 12:43:20AM +0000, Vijay Chidambaram wrote:
> It does expand our already-large search space, but our first order of
> business is making sure CrashMonkey can reproduce every crash-consistency
> bug reported in recent times (mostly by Amir :) ). So for now we were just
> analyzing the bug and trying to understand it, but it looks like the
> post-recovery image is not very useful for this.

Right, the post-recovery (after the journal replayed) is not very
useful.  Unfortunately, the pre-recovery (after the power cut, but
before the journal replay) I suspect won't be terribly interesting
either.  It will show that the corruption is baked into the journal
--- which is to say, the problem wasn't in whether the calls to the
jbd2 layer were correct --- but rather, that one of the file system
mutations in a specific jbd2 handle's "micro-transaction" left the
file system is an inconsistent state.

Not a terrible inconsistency, and it would be quickly papered over in
a follow-up handle --- but one where if the handle which left the file
system in an inconsistent state, and the handle which cleaned it up
were in different transactions, and the power cut happened after the
first transaction, the file system be left in a state where e2fsck
would complain.

So if you have the I/O trace where the handles in question were
assigned to the right (wrong) set of transactions, then yes, you'll
see the problem, just as the xfstest will see the problem.

But if you want to improve the CrashMonkey's search of the problem
space, it will require higher-level logging, because this is really a
different sort of bug.  CrashMonkey will find (a) bugs in jbd2, and
(b) bugs in how the jbd2 layer is called.  This bug is really a bug in
ext4 implementation, because it is in *how* the file system was
mutated that temporarily left it in an inconsistent state, and that's
a different thing from (a) or (b).  Which is great --- it's arguably
additional research work that can be segregated into a different
"Minimum Publishable Unit".  :-)

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ