[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38cb83c5-600b-c523-8006-5aab0a3eb9b0@huawei.com>
Date: Mon, 10 Dec 2018 17:26:07 +0800
From: "zhangyi (F)" <yi.zhang@...wei.com>
To: "Theodore Y. Ts'o" <tytso@....edu>
CC: <linux-ext4@...r.kernel.org>, <miaoxie@...wei.com>
Subject: Re: [PATCH] ext4: fix unsafe extent initialization
On 2018/12/10 13:10, Theodore Y. Ts'o Wrote:
> On Sat, Dec 08, 2018 at 10:25:58PM +0800, zhangyi (F) wrote:
>> Current ext4 will call ext4_ext_convert_to_initialized() to split and
>> initialize an unwritten extent if someone write something to it. It may
>> also zeroout the nearby blocks and expand the split extent if the
>> allocated extent is fully inside i_size or new_size. But it may lead to
>> inode inconsistency when system crash or the power fails.
>>
>> Consider the following case:
>> - Create an empty file and buffer write from block A to D (with delay
>> allocate). It will update the i_size to D.
>> - Zero range from part of block B to D. It will allocate an unwritten
>> extent from B to D.
>> - The write back worker write block B and initialize the unwritten
>> extent from B to D, and then update the i_disksize to B.
>> - System crash.
>> - Remount and fsck complain about the extent size exceeds the inode
>> size.
>>
>> This patch add checking i_disksize and chose the small one between
>> i_size to make sure it's safe to convert extent to initialized.
>>
>> ---------------------
>>
>> This problem can reproduce by xfstests generic/482 with fsstress seed
>> 1544025012.
>
> Hmm, your explanation is great and the patch makes sense. I haven't
> been able to reproduce the problem by adding -s 1544025012 to the
> fsstress arguments. This may be because fsstress being run with two
> processes (-p 2) and the failure may be timing dependent?
>
Yes, it is timing dependent and not quite easy to make a simple fast reproducer.
The default parameter of fsstress (tested by generic/482) on my machine is:
./ltp/fsstress -w -d /mnt/scratch -n 512 -p 8 -s 1544025012 -v
> How easily can you replicate the problem?
It is about 2~5% probability to replicate this problem under generic/482 on my
machine, and it can also appear in generic/019 and generic/455.
After reproducing and checking this problem again, I think the above explanation
is not accurate. I simplify the running process in generic/482 and the real case
could be:
- Create an empty file and buffer write from block A to D (with delay allocate).
- Buffer write from X to Z, now the i_size of this inode is updated to Z.
- Zero range from part of block B to D, it will allocate an unwritten extent
from B to D. Note that it also will skip disksize update in
ext4_zero_range() -> ext4_update_disksize_before_punch() because the i_size
is large than the end of this zero range.
- The write back kworker write block B and initialize the whole unwritten
extent from B to D, and then update the i_disksize to the end of B.
- ext4_journal_stop()
- kjournald2 process weakup and call jbd2_journal_commit_transaction() to commit
journal and send FUA.
- System crash.
- System reboot and fsck complain about the extent size exceeds the inode size.
Thanks,
Yi.
Powered by blists - more mailing lists