lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ympvfypw3222g2k4xzd5pba4zhkz5jihw4td67iixvrqhuu43y@wse63ntv4s6u>
Date: Wed, 8 Oct 2025 13:44:22 +0200
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz, 
	yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 03/13] ext4: introduce seq counter for the extent
 status entry

On Thu 25-09-25 17:25:59, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> In the iomap_write_iter(), the iomap buffered write frame does not hold
> any locks between querying the inode extent mapping info and performing
> page cache writes. As a result, the extent mapping can be changed due to
> concurrent I/O in flight. Similarly, in the iomap_writepage_map(), the
> write-back process faces a similar problem: concurrent changes can
> invalidate the extent mapping before the I/O is submitted.
> 
> Therefore, both of these processes must recheck the mapping info after
> acquiring the folio lock. To address this, similar to XFS, we propose
> introducing an extent sequence number to serve as a validity cookie for
> the extent. After commit 24b7a2331fcd ("ext4: clairfy the rules for
> modifying extents"), we can ensure the extent information should always
> be processed through the extent status tree, and the extent status tree
> is always uptodate under i_rwsem or invalidate_lock or folio lock, so
> it's safe to introduce this sequence number. The sequence number will be
> increased whenever the extent status tree changes, preparing for the
> buffered write iomap conversion.
> 
> Besides, this mechanism is also applicable for the moving extents case.
> In move_extent_per_page(), it also needs to reacquire data_sem and check
> the mapping info again under the folio lock.
> 
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>

One idea for future optimization as I'm reading the series:

> @@ -955,6 +961,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>  		}
>  		pending = err3;
>  	}
> +	ext4_es_inc_seq(inode);
>  error:
>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>  	/*

ext4_es_insert_extent() doesn't always need to increment the sequence
counter. It is used in two situations:

1) When we found the extent in the on-disk extent tree and want to cache it
in memory. No increment needed is in this case.

2) When we allocated new blocks or changed their status. Increment needed
in this case.

Case 1) can be actually pretty frequent on large files and we would be
unnecessarily invalidating mapping information for operations happening in
other parts of the file although no allocation information changes are
actually happening.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ