lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Nov 2017 01:13:07 -0500
From:   Theodore Ts'o <tytso@....edu>
To:     Ashlie Martinez <ashmrtn@...xas.edu>
Cc:     Vijay Chidambaram <vvijay03@...il.com>,
        Ext4 <linux-ext4@...r.kernel.org>
Subject: Re: ext4 fix for interaction between i_size, fallocate, and delalloc
 after a crash

On Tue, Nov 28, 2017 at 03:27:47PM -0600, Ashlie Martinez wrote:
> 
> Unfortunately this timing bug only reproduces on some machines. Xiao
> and I have been unable to reproduce this bug (I've tried kvm-xfstests,
> my own kvm VMs, VMs without kvm, VMs with/without virtio drivers, and
> another bare metal system). generic/456 basically sets up a race
> condition between a kernel flusher thread and triggering dm-flakey, so
> I think things like system load, core count, etc. might cause
> different test results.

Hmm, now I remember the details.  It reproduced reliably on
gce-xfstests, but I was able to use kvm-xfstests to debug the problem
(by invocations of debugfs to dump the file system state as I had
described).  That's because debugfs operates on the buffer cache, and
before the jbd2 commit, the changes to the inode structure are in the
buffer cache, but they aren't allowed to be persisted on disk until
after the journal commit.  And I was using debugfs to dump the inode's
extent tree (as it exists in the buffer cache) before triggering
dm-flakey.

Now that we understand what is happening, it should be simple to
adjust the test so it reliably reproduces, by adding a "sleep 6"
before _flakey_drop_and_remote.  Since the delayed allocation write
won't get resolved until 30 seconds after the inode was first dirtied,
and the default jbd2 timer value is 5 seconds, this should guarantee
that the jbd2 commit has taken place so that the inode changes made by
fallocate are persisted onto the journal, while still allowing the
delayed allocation write to be remain unresolved.

Cheers,

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ