[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8263025e-15e6-fbc7-2826-f6f9ac3d9043@huawei.com>
Date: Mon, 11 Mar 2024 15:55:30 +0800
From: Zhihao Cheng <chengzhihao1@...wei.com>
To: Dave Chinner <david@...morbit.com>
CC: <brauner@...nel.org>, <djwong@...nel.org>, <jack@...e.cz>,
<tytso@....edu>, <linux-xfs@...r.kernel.org>,
<linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-ext4@...r.kernel.org>, <yi.zhang@...wei.com>
Subject: Re: [PATCH RFC 1/2] iomap: Add a IOMAP_DIO_MAY_INLINE_COMP flag
在 2024/3/1 18:02, Zhihao Cheng 写道:
> 在 2024/3/1 8:40, Dave Chinner 写道:
>
Hi Dave. Friendly ping.
> Hi Dave, thanks for your detailed and nice suggestions, I have a few
> questions below.
>> On Thu, Feb 29, 2024 at 07:38:48PM +0800, Zhihao Cheng wrote:
>>> It will be more efficient to execute quick endio process(eg. non-sync
>>> overwriting case) under irq process rather than starting a worker to
>>> do it.
>>> Add a flag to control DIO to be finished inline(under irq context),
>>> which
>>> can be used for non-sync overwriting case.
>>> Besides, skip invalidating pages if DIO is finished inline, which will
>>> keep the same logic with dio_bio_end_aio in non-sync overwriting case.
>>>
>>> Signed-off-by: Zhihao Cheng <chengzhihao1@...wei.com>
>>
>> A nice idea, but I don't think an ext4 specific API flag is the
>> right way to go about enabling this. The iomap dio code knows if
>> the write is pure overwrite already - we have the IOMAP_F_DIRTY flag
>> for that, and we combine this with IOMAP_DIO_WRITE_THROUGH to do the
>> pure overwrite FUA optimisations.
>>
>> That is:
>>
>> /*
>> * Use a FUA write if we need datasync semantics,
>> this is a pure
>> * data IO that doesn't require any metadata updates
>> (including
>> * after IO completion such as unwritten extent
>> conversion) and
>> * the underlying device either supports FUA or
>> doesn't have
>> * a volatile write cache. This allows us to avoid
>> cache flushes
>> * on IO completion. If we can't use writethrough and
>> need to
>> * sync, disable in-task completions as dio
>> completion will
>> * need to call generic_write_sync() which will do a
>> blocking
>> * fsync / cache flush call.
>> */
>> if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
>> (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
>> (bdev_fua(iomap->bdev) ||
>> !bdev_write_cache(iomap->bdev)))
>> use_fua = true;
>>
>> Hence if we want to optimise pure overwrites that have no data sync
>> requirements, we already have the detection and triggers in place to
>> do this. We just need to change the way we set up the IO flags to
>> allow write-through (i.e. non-blocking IO completions) to use inline
>> completions.
>>
>> In __iomap_dio_rw():
>>
>> + /* Always try to complete inline. */
>> + dio->flags |= IOMAP_DIO_INLINE_COMP;
>> if (iov_iter_rw(iter) == READ) {
>> - /* reads can always complete inline */
>> - dio->flags |= IOMAP_DIO_INLINE_COMP;
>> ....
>>
>> } else {
>> + /* Always try write-through semantics. If we can't
>> + * use writethough, it will be disabled along with
>> + * IOMAP_DIO_INLINE_COMP before dio completion is run
>> + * so it can be deferred to a task completion context
>> + * appropriately.
>> + */
>> + dio->flags |= IOMAP_DIO_WRITE | IOMAP_DIO_WRITE_THROUGH;
>
> There is a behavior change here, if we set IOMAP_DIO_WRITE_THROUGH
> unconditionally, non-datasync IO will be set with REQ_FUA, which means
> that device will flush writecache for each IO, will it affect the
> performance in non-sync dio case?
>> iomi.flags |= IOMAP_WRITE;
>> - dio->flags |= IOMAP_DIO_WRITE;
>> .....
>> /* for data sync or sync, we need sync completion processing */
>> if (iocb_is_dsync(iocb)) {
>> dio->flags |= IOMAP_DIO_NEED_SYNC;
>>
>> - /*
>> - * For datasync only writes, we optimistically
>> try using
>> - * WRITE_THROUGH for this IO. This flag
>> requires either
>> - * FUA writes through the device's write cache,
>> or a
>> - * normal write to a device without a volatile
>> write
>> - * cache. For the former, Any non-FUA write
>> that occurs
>> - * will clear this flag, hence we know before
>> completion
>> - * whether a cache flush is necessary.
>> - */
>> - if (!(iocb->ki_flags & IOCB_SYNC))
>> - dio->flags |= IOMAP_DIO_WRITE_THROUGH;
>> + * For sync writes we know we are going to need
>> + * blocking completion processing, so turn off
>> + * writethrough now.
>> + */
>> if (iocb->ki_flags & IOCB_SYNC) {
>> dio->flags &= ~(IOMAP_DIO_WRITE_THROUGH |
>> IOMAP_DIO_INLINE_COMP);
>> }
>> }
>>
>
> [...]
>>
>> However, this does mean that any spinlock taken in the ->end_io()
>> callbacks now needs to be irq safe. e.g. in xfs_dio_write_end_io()
>> the spinlock protection around inode size updates will need to use
>> an irq safe locking, as will the locking in the DIO submission path
>> that it serialises against in xfs_file_write_checks(). That probably
>> is best implemented as a separate spinlock.
>>
>> There will also be other filesystems that need to set IOMAP_F_DIRTY
>> unconditionally (e.g. zonefs) because they always take blocking
>> locks in their ->end_io callbacks and so must always run in task
>> context...
> Should we add a new flag(eg. IOMAP_F_ENDIO_IRQ ?) to indicate that the
> endio cannot be done under irq? Because I think IOMAP_F_DIRTY means that
> the metadata needs to be written, if we add a new semantics(endio must
> be done in defered work) for this flag, the code will looks a little
> complicated.
>
>
> .
Powered by blists - more mailing lists