linux-kernel - Re: [PATCH v2 03/13] ext4: introduce seq counter for the extent status entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1daf2836-f497-4de7-ac8c-32d4d5e68f83@huaweicloud.com>
Date: Thu, 9 Oct 2025 14:52:36 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
 yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com,
 yangerkun@...wei.com
Subject: Re: [PATCH v2 03/13] ext4: introduce seq counter for the extent
 status entry

On 10/8/2025 7:44 PM, Jan Kara wrote:
> On Thu 25-09-25 17:25:59, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> In the iomap_write_iter(), the iomap buffered write frame does not hold
>> any locks between querying the inode extent mapping info and performing
>> page cache writes. As a result, the extent mapping can be changed due to
>> concurrent I/O in flight. Similarly, in the iomap_writepage_map(), the
>> write-back process faces a similar problem: concurrent changes can
>> invalidate the extent mapping before the I/O is submitted.
>>
>> Therefore, both of these processes must recheck the mapping info after
>> acquiring the folio lock. To address this, similar to XFS, we propose
>> introducing an extent sequence number to serve as a validity cookie for
>> the extent. After commit 24b7a2331fcd ("ext4: clairfy the rules for
>> modifying extents"), we can ensure the extent information should always
>> be processed through the extent status tree, and the extent status tree
>> is always uptodate under i_rwsem or invalidate_lock or folio lock, so
>> it's safe to introduce this sequence number. The sequence number will be
>> increased whenever the extent status tree changes, preparing for the
>> buffered write iomap conversion.
>>
>> Besides, this mechanism is also applicable for the moving extents case.
>> In move_extent_per_page(), it also needs to reacquire data_sem and check
>> the mapping info again under the folio lock.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> 
> One idea for future optimization as I'm reading the series:

Hi, Jan!

Thank you very much for reviewing this series!

> 
>> @@ -955,6 +961,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>  		}
>>  		pending = err3;
>>  	}
>> +	ext4_es_inc_seq(inode);
>>  error:
>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>>  	/*
> 
> ext4_es_insert_extent() doesn't always need to increment the sequence
> counter. It is used in two situations:
> 
> 1) When we found the extent in the on-disk extent tree and want to cache it
> in memory. No increment needed is in this case.
> 
> 2) When we allocated new blocks or changed their status. Increment needed
> in this case.
> 
> Case 1) can be actually pretty frequent on large files and we would be
> unnecessarily invalidating mapping information for operations happening in
> other parts of the file although no allocation information changes are
> actually happening.
> 

Indeed, the sequence count increment in Case 1 can be omitted because it
does not change any real extent. This increment can cause unnecessary
invalidation, potentially incurring additional overhead in some concurrency
scenarios.

Distinguishing between these two scenarios does not seem complicated. Since
the iomap conversion has not yet been completed, currently only the
defragmentation use this mechanism, I can add a TODO comment here now and
then initiate a new series to optimize it.

Thanks,
Yi.