[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4f354f7-7885-8f25-90dd-bec54daba405@huaweicloud.com>
Date: Mon, 3 Jun 2024 22:18:20 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, djwong@...nel.org, hch@...radead.org,
brauner@...nel.org, chandanbabu@...nel.org, jack@...e.cz,
willy@...radead.org, yi.zhang@...wei.com, chengzhihao1@...wei.com,
yukuai3@...wei.com
Subject: Re: [RFC PATCH v4 5/8] xfs: refactor the truncating order
On 2024/6/3 6:46, Dave Chinner wrote:
> On Wed, May 29, 2024 at 05:52:03PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> When truncating down an inode, we call xfs_truncate_page() to zero out
>> the tail partial block that beyond new EOF, which prevents exposing
>> stale data. But xfs_truncate_page() always assumes the blocksize is
>> i_blocksize(inode), it's not always true if we have a large allocation
>> unit for a file and we should aligned to this unitsize, e.g. realtime
>> inode should aligned to the rtextsize.
>>
>> Current xfs_setattr_size() can't support zeroing out a large alignment
>> size on trucate down since the process order is wrong. We first do zero
>> out through xfs_truncate_page(), and then update inode size through
>> truncate_setsize() immediately. If the zeroed range is larger than a
>> folio, the write back path would not write back zeroed pagecache beyond
>> the EOF folio, so it doesn't write zeroes to the entire tail extent and
>> could expose stale data after an appending write into the next aligned
>> extent.
>>
>> We need to adjust the order to zero out tail aligned blocks, write back
>> zeroed or cached data, update i_size and drop cache beyond aligned EOF
>> block, preparing for the fix of realtime inode and supporting the
>> upcoming forced alignment feature.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>> ---
> .....
>> @@ -853,30 +854,7 @@ xfs_setattr_size(
>> * the transaction because the inode cannot be unlocked once it is a
>> * part of the transaction.
>> *
>> - * Start with zeroing any data beyond EOF that we may expose on file
>> - * extension, or zeroing out the rest of the block on a downward
>> - * truncate.
>> - */
>> - if (newsize > oldsize) {
>> - trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
>> - error = xfs_zero_range(ip, oldsize, newsize - oldsize,
>> - &did_zeroing);
>> - } else if (newsize != oldsize) {
>> - error = xfs_truncate_page(ip, newsize, &did_zeroing);
>> - }
>> -
>> - if (error)
>> - return error;
>> -
>> - /*
>> - * We've already locked out new page faults, so now we can safely remove
>> - * pages from the page cache knowing they won't get refaulted until we
>> - * drop the XFS_MMAP_EXCL lock after the extent manipulations are
>> - * complete. The truncate_setsize() call also cleans partial EOF page
>> - * PTEs on extending truncates and hence ensures sub-page block size
>> - * filesystems are correctly handled, too.
>> - *
>> - * We have to do all the page cache truncate work outside the
>> + * And we have to do all the page cache truncate work outside the
>> * transaction context as the "lock" order is page lock->log space
>> * reservation as defined by extent allocation in the writeback path.
>> * Hence a truncate can fail with ENOMEM from xfs_trans_alloc(), but
> ......
>
> Lots of new logic for zeroing here. That makes xfs_setattr_size()
> even longer than it already is. Can you lift this EOF zeroing logic
> into it's own helper function so that it is clear that it is a
> completely independent operation to the actual transaction that
> changes the inode size. That would also allow the operations to be
> broken up into:
>
> if (newsize >= oldsize) {
> /* do the simple stuff */
> ....
> return error;
> }
> /* do the complex size reduction stuff without additional indenting */
>
Sure, I will try to factor them out.
Thanks,
Yi.
Powered by blists - more mailing lists