lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7btwyxrkixgmv45jeh3bf4uf4fqmauypb2ss67uqsiicklz6gq@rmll5ujssmwx>
Date: Mon, 22 Dec 2025 11:15:26 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu, 
	adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com, ritesh.list@...il.com, 
	yi.zhang@...wei.com, yizhang089@...il.com, libaokun1@...wei.com, yangerkun@...wei.com, 
	yukuai@...as.com
Subject: Re: [PATCH -next 3/7] ext4: avoid starting handle when dio writing
 an unwritten extent

On Sat 20-12-25 15:16:41, Zhang Yi wrote:
> On 12/19/2025 11:25 PM, Jan Kara wrote:
> > On Sat 13-12-25 10:20:04, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@...wei.com>
> >>
> >> Since we have deferred the split of the unwritten extent until after I/O
> >> completion, it is not necessary to initiate the journal handle when
> >> submitting the I/O.
> >>
> >> This can improve the write performance of concurrent DIO for multiple
> >> files. The fio tests below show a ~25% performance improvement when
> >> wirting to unwritten files on my VM with a mem disk.
> >>
> >>   [unwritten]
> >>   direct=1
> >>   ioengine=psync
> >>   numjobs=16
> >>   rw=write     # write/randwrite
> >>   bs=4K
> >>   iodepth=1
> >>   directory=/mnt
> >>   size=5G
> >>   runtime=30s
> >>   overwrite=0
> >>   norandommap=1
> >>   fallocate=native
> >>   ramp_time=5s
> >>   group_reporting=1
> >>
> >>  [w/o]
> >>   w:  IOPS=62.5k, BW=244MiB/s
> >>   rw: IOPS=56.7k, BW=221MiB/s
> >>
> >>  [w]
> >>   w:  IOPS=79.6k, BW=311MiB/s
> >>   rw: IOPS=70.2k, BW=274MiB/s
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> >> ---
> >>  fs/ext4/file.c  | 4 +---
> >>  fs/ext4/inode.c | 4 +++-
> >>  2 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> >> index 7a8b30932189..9f571acc7782 100644
> >> --- a/fs/ext4/file.c
> >> +++ b/fs/ext4/file.c
> >> @@ -418,9 +418,7 @@ static const struct iomap_dio_ops ext4_dio_write_ops = {
> >>   *   updating inode i_disksize and/or orphan handling with exclusive lock.
> >>   *
> >>   * - shared locking will only be true mostly with overwrites, including
> >> - *   initialized blocks and unwritten blocks. For overwrite unwritten blocks
> >> - *   we protect splitting extents by i_data_sem in ext4_inode_info, so we can
> >> - *   also release exclusive i_rwsem lock.
> >> + *   initialized blocks and unwritten blocks.
> >>   *
> >>   * - Otherwise we will switch to exclusive i_rwsem lock.
> >>   */
> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >> index ffde24ff7347..08a296122fe0 100644
> >> --- a/fs/ext4/inode.c
> >> +++ b/fs/ext4/inode.c
> >> @@ -3819,7 +3819,9 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >>  			 * For atomic writes the entire requested length should
> >>  			 * be mapped.
> >>  			 */
> >> -			if (map.m_flags & EXT4_MAP_MAPPED) {
> >> +			if ((map.m_flags & EXT4_MAP_MAPPED) ||
> >> +			    (!(flags & IOMAP_DAX) &&
> > 
> > Why is here an exception for DAX writes? DAX is fine writing to unwritten
> > extents AFAIK. It only needs to pre-zero newly allocated blocks... Or am I
> > missing some corner case?
> > 
> > 								Honza
> 
> Hi, Jan!
> 
> Thank you for reviewing this series.
> 
> Yes, that is precisely why this exception is necessary here. Without this
> exception, a DAX write to an unwritten extent would return immediately
> without invoking ext4_iomap_alloc() to perform pre-zeroing.

Ah, you're right. I already forgot how writing to unwritten extents works
with DAX and it seems we convert the extents to initialized (and zero them
out) before copying the data. Can you please expand the comment above by
"For DAX we convert extents to initialized ones before copying the data,
otherwise we do it after IO so there's no need to call into
ext4_iomap_alloc()." Otherwise feel free to add:

Reviewed-by: Jan Kara <jack@...e.cz>

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ