[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90e5f9b4-eb1d-4d63-ba22-4a1f564b2ccf@163.com>
Date: Thu, 9 Jan 2025 20:10:27 +0800
From: Chi Zhiling <chizhiling@....com>
To: Amir Goldstein <amir73il@...il.com>
Cc: John Garry <john.g.garry@...cle.com>, Dave Chinner <david@...morbit.com>,
djwong@...nel.org, cem@...nel.org, linux-xfs@...r.kernel.org,
linux-kernel@...r.kernel.org, Chi Zhiling <chizhiling@...inos.cn>
Subject: Re: [PATCH] xfs: Remove i_rwsem lock in buffered read
On 2025/1/9 18:25, Amir Goldstein wrote:
>>> One more thing I should mention.
>>> You do not need to wait for atomic large writes patches to land.
>>> There is nothing stopping you from implementing the suggested
>>> solution based on the xfs code already in master (v6.13-rc1),
>>> which has support for the RWF_ATOMIC flag for writes.
>
> Only I missed the fact that there is not yet a plan to support
> atomic buffered writes :-/
I think it's necessary to support atomic buffered writes.
>
>>>
>>> It just means that the API will not be usable for applications that
>>> want to do IO larger than block size, but concurrent read/write
>> ^
>> To be precise, this is the page size, not the block size, right?
>>
>
> fs block size:
>
> if (iocb->ki_flags & IOCB_ATOMIC) {
> /*
> * Currently only atomic writing of a single FS block is
> * supported. It would be possible to atomic write smaller than
> * a FS block, but there is no requirement to support this.
> * Note that iomap also does not support this yet.
> */
> if (ocount != ip->i_mount->m_sb.sb_blocksize)
> return -EINVAL;
> ret = generic_atomic_write_valid(iocb, from);
> if (ret)
> return ret;
> }
Uh, okay, maybe I didn't understand it very accurately. I thought
you were talking about buffered IO. I was quite curious as to why
you mentioned block size in the context of buffered IO.
>
>>> performance of 4K IO could be improved already.
>>
>> Great, which means that IO operations aligned within a single page
>> can be executed concurrently, because the folio lock already
>> provides atomicity guarantees.
>>
>> If the write does not exceed the boundary of a page, we can
>> downgrade the iolock to XFS_IOLOCK_SHARED. It seems to be safe
>> and will not change the current behavior.
>>
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -454,6 +454,11 @@ xfs_file_write_checks(
>> if (error)
>> return error;
>>
>> + if ( iocb->ki_pos >> PAGE_SHIFT == (iocb->ki_pos + count) >>
>> PAGE_SHIFT) {
>> + *iolock = XFS_IOLOCK_SHARED;
>> + }
>> +
>> /*
>> * For changing security info in file_remove_privs() we need
>> i_rwsem
>> * exclusively.
>>
>
> I think that may be possible, but you should do it in the buffered write
> code as the patch below.
> xfs_file_write_checks() is called from code paths like
> xfs_file_dio_write_unaligned() where you should not demote to shared lock.
Wow, thank you for the reminder. This is the prototype of the patch.
I might need to consider more scenarios and conduct testing before
sending the patch.
>
>
>>>
>>> It's possible that all you need to do is:
>>>
>>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>>> index c488ae26b23d0..2542f15496488 100644
>>> --- a/fs/xfs/xfs_file.c
>>> +++ b/fs/xfs/xfs_file.c
>>> @@ -777,9 +777,10 @@ xfs_file_buffered_write(
>>> ssize_t ret;
>>> bool cleared_space = false;
>>> unsigned int iolock;
>>> + bool atomic_write = iocb->ki_flags & IOCB_ATOMIC;
>>>
>>> write_retry:
>>> - iolock = XFS_IOLOCK_EXCL;
>>> + iolock = atomic_write ? XFS_IOLOCK_SHARED : XFS_IOLOCK_EXCL;
>>> ret = xfs_ilock_iocb(iocb, iolock);
>>> --
>>>
>>> xfs_file_write_checks() afterwards already takes care of promoting
>>> XFS_IOLOCK_SHARED to XFS_IOLOCK_EXCL for extending writes.
>>
>> Yeah, for writes that exceed the PAGE boundary, we can also promote
>> the lock to XFS_IOLOCK_EXCL. Otherwise, I am concerned that it may
>> lead to old data being retained in the file.
>>
>> For example, two processes writing four pages of data to the same area.
>>
>> process A process B
>> --------------------------------
>> write AA--
>> <sleep>
>> new write BBBB
>> write --AA
>>
>> The final data is BBAA.
>>
>
> What is the use case for which you are trying to fix performance?
> Is it a use case with single block IO? if not then it does not help to implement
> a partial solution for single block size IO.
I want to improve the UnixBench score for XFS. UnixBench uses buffered
IO for testing, including both single-threaded and multi-threaded tests.
Additionally, the IO size tested in UnixBench is below 4K, so this will
be very helpful.
Thanks,
Chi Zhiling
Powered by blists - more mailing lists