linux-ext4 - Re: [PATCH] ext4: fix interaction between i_size, fallocate, and delalloc after a crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59DDFC47.3050300@cn.fujitsu.com>
Date:   Wed, 11 Oct 2017 19:11:03 +0800
From:   Xiao Yang <yangx.jy@...fujitsu.com>
To:     Theodore Ts'o <tytso@....edu>
CC:     Ashlie Martinez <ashmrtn@...xas.edu>,
        Amir Goldstein <amir73il@...il.com>,
        Eryu Guan <eguan@...hat.com>, Josef Bacik <jbacik@...com>,
        fstests <fstests@...r.kernel.org>,
        Ext4 <linux-ext4@...r.kernel.org>,
        Vijay Chidambaram <vvijay03@...il.com>
Subject: Re: [PATCH] ext4: fix interaction between i_size, fallocate, and
 delalloc after a crash

On 2017/10/07 11:29, Theodore Ts'o wrote:
> On Thu, Oct 05, 2017 at 07:34:10PM -0500, Ashlie Martinez wrote:
>>>> It almost seems that way, though to be honest, I don't think I know
>>>> enough about 1. the setup used by Amir and 2. all the internal working
>>>> of KVM+virtio to say for sure.
>>> I believe you misread my email.
>>> The problem was NOT reproduced on KVM+virtio.
>>> The problem *is* reproduced on a 10G LVM volume over SSD on
>>> Ubuntu 16.04 with latest kernel and latest e2fsprogs.
> I was able to reproduce it using both kvm-xfstests[1] and gce-xfstests[2].
>
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
> [2] https://thunk.org/gce-xfstests
>
> It did turn out to be timing related, and it requires that a journal
> commit take place after fsstress runs, but it can *not* be triggered
> by a sync/fsync (as this would cause the delayed allocation writes to
> be forced out to disk, and that makes the problem go away).
>
> I initially tried using xfs_io as a replacement for fsstress (since it
> is more flexible and would allow me to more easily run experiments),
> but it turns out xfs_io was too fast/efficient, and so using xfs_io to
> execute the same system calls (verified by strace) would not replicate
> the problem.
>
>>> Now you have a broken file system image and the exact set of operations
>>> that led to it. That's should be a pretty big lead for investigation.
> It was indeed a big lead for investigation (thanks, Amir!), but it
> still took me several hours before I was finally able to figure out
> the problem.  The patch and the commit description should explain what
> was going on.
>
> I'll leave it to Ashlie and Vijay to investigate how to improve Crash
> Monkey so it can better find problems like this automatically.  Since
> you now have a clear reproducer (you can use generic/456 and run it on
> gce-xfstests, using is a standard cloud VM configuration) and an
> explanation of the bug and the four-line fix, I suspect this might be
> good grist for follow-on research after your Hot Storage '17 workshop
> paper.  :-)
>
> Best regards,
>
> 					- Ted
>
>
> commit 3912e7b44cf77e9452d4d0cb6c1da9c7043bb7f1
> Author: Theodore Ts'o<tytso@....edu>
> Date:   Fri Oct 6 23:09:55 2017 -0400
>
>      ext4: fix interaction between i_size, fallocate, and delalloc after a crash
Hi Theodore,

After applying your patch, generic/456 always passes on my system which 
just triggers the output[2].
So i could believe this two different outputs[1][2] are triggered by 
different environments, but they
are caused by the same bug which your patch fixes.  Is this right?

[1] Inode 12, end of extent exceeds allowed value(logical block 33, 
physical block 33441, len 7)Clear? no
       Inode 12, i_blocks is 184, should be 128. Fix? no
[2] Inode 12, i_size is 147456, should be 163840. Fix? no

Sorry, i am not familiar with ext4.

Thanks,
Xiao Yang
>
>      If there are pending writes subject to delayed allocation, then i_size
>      will show size after the writes have completed, while i_disksize
>      contains the value of i_size on the disk (since the writes have not
>      been persisted to disk).
>
>      If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
>      with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
>      after the fallocate(2) is between i_size and i_disksize, then after a
>      crash, if a journal commit has resulted in the changes made by the
>      fallocate() call to be persisted after a crash, but the delayed
>      allocation write has not resolved itself, i_size would not be updated,
>      and this would cause the following e2fsck complaint:
>
>      Inode 12, end of extent exceeds allowed value
>              (logical block 33, physical block 33441, len 7)
>
>      This can only take place on a sparse file, where the fallocate(2) call
>      is allocating blocks in a range which is before a pending delayed
>      allocation write which is extending i_size.  Since this situation is
>      quite rare, and the window in which the crash must take place is
>      typically<  30 seconds, in practice this condition will rarely happen.
>
>      Nevertheless, it can be triggered in testing, and in particular by
>      xfstests generic/456.
>
>      Signed-off-by: Theodore Ts'o<tytso@....edu>
>      Reported-by: Amir Goldstein<amir73il@...il.com>
>      Cc: stable@...r.kernel.org
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 97f0fd06728d..07bca11749d4 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4794,7 +4794,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>   	}
>
>   	if (!(mode&  FALLOC_FL_KEEP_SIZE)&&
> -	     offset + len>  i_size_read(inode)) {
> +	    (offset + len>  i_size_read(inode) ||
> +	     offset + len>  EXT4_I(inode)->i_disksize)) {
>   		new_size = offset + len;
>   		ret = inode_newsize_ok(inode, new_size);
>   		if (ret)
> @@ -4965,7 +4966,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   	}
>
>   	if (!(mode&  FALLOC_FL_KEEP_SIZE)&&
> -	     offset + len>  i_size_read(inode)) {
> +	    (offset + len>  i_size_read(inode) ||
> +	     offset + len>  EXT4_I(inode)->i_disksize)) {
>   		new_size = offset + len;
>   		ret = inode_newsize_ok(inode, new_size);
>   		if (ret)
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> .
>