[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1A8D0C2B-EB45-4807-9802-C7CB6DB544E3@dilger.ca>
Date: Mon, 10 Dec 2018 13:14:02 -0700
From: Andreas Dilger <adilger@...ger.ca>
To: "zhangyi (F)" <yi.zhang@...wei.com>
Cc: "Theodore Y. Ts'o" <tytso@....edu>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Miao Xie <miaoxie@...wei.com>
Subject: Re: [PATCH] ext4: fix unsafe extent initialization
On Dec 10, 2018, at 2:26 AM, zhangyi (F) <yi.zhang@...wei.com> wrote:
>
> On 2018/12/10 13:10, Theodore Y. Ts'o Wrote:
>> On Sat, Dec 08, 2018 at 10:25:58PM +0800, zhangyi (F) wrote:
>>> Current ext4 will call ext4_ext_convert_to_initialized() to split and
>>> initialize an unwritten extent if someone write something to it. It may
>>> also zeroout the nearby blocks and expand the split extent if the
>>> allocated extent is fully inside i_size or new_size. But it may lead to
>>> inode inconsistency when system crash or the power fails.
>>>
>>> Consider the following case:
>>> - Create an empty file and buffer write from block A to D (with delay
>>> allocate). It will update the i_size to D.
>>> - Zero range from part of block B to D. It will allocate an unwritten
>>> extent from B to D.
>>> - The write back worker write block B and initialize the unwritten
>>> extent from B to D, and then update the i_disksize to B.
>>> - System crash.
>>> - Remount and fsck complain about the extent size exceeds the inode
>>> size.
>>>
>>> This patch add checking i_disksize and chose the small one between
>>> i_size to make sure it's safe to convert extent to initialized.
>>>
>>> ---------------------
>>>
>>> This problem can reproduce by xfstests generic/482 with fsstress seed
>>> 1544025012.
>>
>> Hmm, your explanation is great and the patch makes sense. I haven't
>> been able to reproduce the problem by adding -s 1544025012 to the
>> fsstress arguments. This may be because fsstress being run with two
>> processes (-p 2) and the failure may be timing dependent?
>>
> Yes, it is timing dependent and not quite easy to make a simple fast reproducer.
> The default parameter of fsstress (tested by generic/482) on my machine is:
>
> ./ltp/fsstress -w -d /mnt/scratch -n 512 -p 8 -s 1544025012 -v
>
>> How easily can you replicate the problem?
>
> It is about 2~5% probability to replicate this problem under generic/482 on my
> machine, and it can also appear in generic/019 and generic/455.
>
> After reproducing and checking this problem again, I think the above explanation
> is not accurate. I simplify the running process in generic/482 and the real case
> could be:
>
> - Create an empty file and buffer write from block A to D (with delay allocate).
> - Buffer write from X to Z, now the i_size of this inode is updated to Z.
> - Zero range from part of block B to D, it will allocate an unwritten extent
> from B to D. Note that it also will skip disksize update in
> ext4_zero_range() -> ext4_update_disksize_before_punch() because the i_size
> is large than the end of this zero range.
> - The write back kworker write block B and initialize the whole unwritten
> extent from B to D, and then update the i_disksize to the end of B.
On a related note, should this just refuse to allocate uninitialized extents
for regions that are smaller than the threshold that they would immediately
be expanded into initialized extents on when they are modified? This would
avoid churn in the extent tree at the expense of (potentially) a few extra
zero blocks written to disk.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists