[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7btwyxrkixgmv45jeh3bf4uf4fqmauypb2ss67uqsiicklz6gq@rmll5ujssmwx>
Date: Mon, 22 Dec 2025 11:15:26 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu,
adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com, ritesh.list@...il.com,
yi.zhang@...wei.com, yizhang089@...il.com, libaokun1@...wei.com, yangerkun@...wei.com,
yukuai@...as.com
Subject: Re: [PATCH -next 3/7] ext4: avoid starting handle when dio writing
an unwritten extent
On Sat 20-12-25 15:16:41, Zhang Yi wrote:
> On 12/19/2025 11:25 PM, Jan Kara wrote:
> > On Sat 13-12-25 10:20:04, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@...wei.com>
> >>
> >> Since we have deferred the split of the unwritten extent until after I/O
> >> completion, it is not necessary to initiate the journal handle when
> >> submitting the I/O.
> >>
> >> This can improve the write performance of concurrent DIO for multiple
> >> files. The fio tests below show a ~25% performance improvement when
> >> wirting to unwritten files on my VM with a mem disk.
> >>
> >> [unwritten]
> >> direct=1
> >> ioengine=psync
> >> numjobs=16
> >> rw=write # write/randwrite
> >> bs=4K
> >> iodepth=1
> >> directory=/mnt
> >> size=5G
> >> runtime=30s
> >> overwrite=0
> >> norandommap=1
> >> fallocate=native
> >> ramp_time=5s
> >> group_reporting=1
> >>
> >> [w/o]
> >> w: IOPS=62.5k, BW=244MiB/s
> >> rw: IOPS=56.7k, BW=221MiB/s
> >>
> >> [w]
> >> w: IOPS=79.6k, BW=311MiB/s
> >> rw: IOPS=70.2k, BW=274MiB/s
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> >> ---
> >> fs/ext4/file.c | 4 +---
> >> fs/ext4/inode.c | 4 +++-
> >> 2 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> >> index 7a8b30932189..9f571acc7782 100644
> >> --- a/fs/ext4/file.c
> >> +++ b/fs/ext4/file.c
> >> @@ -418,9 +418,7 @@ static const struct iomap_dio_ops ext4_dio_write_ops = {
> >> * updating inode i_disksize and/or orphan handling with exclusive lock.
> >> *
> >> * - shared locking will only be true mostly with overwrites, including
> >> - * initialized blocks and unwritten blocks. For overwrite unwritten blocks
> >> - * we protect splitting extents by i_data_sem in ext4_inode_info, so we can
> >> - * also release exclusive i_rwsem lock.
> >> + * initialized blocks and unwritten blocks.
> >> *
> >> * - Otherwise we will switch to exclusive i_rwsem lock.
> >> */
> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >> index ffde24ff7347..08a296122fe0 100644
> >> --- a/fs/ext4/inode.c
> >> +++ b/fs/ext4/inode.c
> >> @@ -3819,7 +3819,9 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >> * For atomic writes the entire requested length should
> >> * be mapped.
> >> */
> >> - if (map.m_flags & EXT4_MAP_MAPPED) {
> >> + if ((map.m_flags & EXT4_MAP_MAPPED) ||
> >> + (!(flags & IOMAP_DAX) &&
> >
> > Why is here an exception for DAX writes? DAX is fine writing to unwritten
> > extents AFAIK. It only needs to pre-zero newly allocated blocks... Or am I
> > missing some corner case?
> >
> > Honza
>
> Hi, Jan!
>
> Thank you for reviewing this series.
>
> Yes, that is precisely why this exception is necessary here. Without this
> exception, a DAX write to an unwritten extent would return immediately
> without invoking ext4_iomap_alloc() to perform pre-zeroing.
Ah, you're right. I already forgot how writing to unwritten extents works
with DAX and it seems we convert the extents to initialized (and zero them
out) before copying the data. Can you please expand the comment above by
"For DAX we convert extents to initialized ones before copying the data,
otherwise we do it after IO so there's no need to call into
ext4_iomap_alloc()." Otherwise feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists