lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ee200d75-6f3e-4514-8fd4-8cdcbd3754d4@huaweicloud.com>
Date: Thu, 6 Nov 2025 21:02:35 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
 yi.zhang@...wei.com, libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite
 existing extents

Hi! Thank you for the review and suggestions!

On 11/6/2025 5:15 PM, Jan Kara wrote:
> On Fri 31-10-25 14:29:02, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> Currently, ext4_es_cache_extent() is used to load extents into the
>> extent status tree when reading on-disk extent blocks. Since it may be
>> called while moving or modifying the extent tree, so it does not
>> overwrite existing extents in the extent status tree and is only used
>> for the initial loading.
>>
>> There are many other places in ext4 where on-disk extents are inserted
>> into the extent status tree, such as in ext4_map_query_blocks().
>> Currently, they call ext4_es_insert_extent() to perform the insertion,
>> but they don't modify the extents, so ext4_es_cache_extent() would be a
>> more appropriate choice. However, when ext4_map_query_blocks() inserts
>> an extent, it may overwrite a short existing extent of the same type.
>> Therefore, to prepare for the replacements, we need to extend
>> ext4_es_cache_extent() to allow it to overwrite existing extents with
>> the same type.
>>
>> In addition, since cached extents can be more lenient than the extents
>> they modify and do not involve modifying reserved blocks, it is not
>> necessary to ensure that the insertion operation succeeds as strictly as
>> in the ext4_es_insert_extent() function.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> 
> Thanks for writing this series! I think we can actually simplify things
> event further. Extent status tree operations can be divided into three
> groups:
> 1) Lookups in es tree - protected only by i_es_lock.
> 2) Caching of on-disk state into es tree - protected by i_es_lock and
>    i_data_sem (at least in read mode).
> 3) Modification of existing state - protected by i_es_lock and i_data_sem
>    in write mode.

Yeah.

> 
> Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
> that 2) should never see a real conflict - i.e., all intersecting entries
> in es tree have the same status, otherwise this is a bug.

While I was debugging, I observed two exceptions here.

A. The first exceptions is about the delay extent. Since there is no actual
   extent present in the extent tree on the disk, if a delayed extent
   already exists in the extent status tree and someone calls
   ext4_find_extent()->ext4_cache_extents() to cache an extent at the same
   location, then a status mismatch will occur (attempting to replace
   the delayed extent with a hole). This is not a bug.
B. I also observed that ext4_find_extent()->ext4_cache_extents() is called
   during splitting and conversion between unwritten and written states (in
   most scenarios, EXT4_EX_NOCACHE is not added). However, because the
   process is in an intermediate state of handling extents, there can be
   cases where the status do not match. I did not analyze this scenario in
   detail, but since ext4_es_insert_extent() is called at the end of the
   processing to ensure the final state is correct, I don't think this is a
   practical issue either.

Therefore, I believe that retaining non-overwriting is necessary for this
scenario involving ext4_cache_extents() because it will be called during
case 3). Except for ext4_cache_extents(), other scenarios theoretically
should not be involved.

> So I think that 
> ext4_es_cache_extent() should always walk the whole inserted range, verify
> the statuses match and merge all these entries into a single one. This
> isn't going to be slower than what we have today because your
> __es_remove_extent(), __es_insert_extent() pair is effectively doing the
> same thing, just without checking the statuses.

Yes, I agree that we can delegate the verification work in
ext4_es_cache_extent() to __es_remove_extent(). During the process of
overwriting extents, the first step is to remove the existing extents. If
the extent status does not match, an alarm will be triggered.

> That way we always get the
> checking and also ext4_es_cache_extent() doesn't have to have the
> overwriting and non-overwriting variant. What do you think?
> 
> 								Honza

For case A, we can add an exception during verification and skip the
warnings. For case B, We need to ensure that ext4_cache_extents() is not
allowed to be called during the intermediate processing of the extent
tree. This seems feasible in theory, but I guess it is somewhat fragile.
So, keep the non-overwriting mode?

Best Regards,
Yi.

> 
>> ---
>>  fs/ext4/extents.c        |  4 ++--
>>  fs/ext4/extents_status.c | 28 +++++++++++++++++++++-------
>>  fs/ext4/extents_status.h |  2 +-
>>  3 files changed, 24 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index ca5499e9412b..c42ceb5aae37 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -537,12 +537,12 @@ static void ext4_cache_extents(struct inode *inode,
>>  
>>  		if (prev && (prev != lblk))
>>  			ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
>> -					     EXTENT_STATUS_HOLE);
>> +					     EXTENT_STATUS_HOLE, false);
>>  
>>  		if (ext4_ext_is_unwritten(ex))
>>  			status = EXTENT_STATUS_UNWRITTEN;
>>  		ext4_es_cache_extent(inode, lblk, len,
>> -				     ext4_ext_pblock(ex), status);
>> +				     ext4_ext_pblock(ex), status, false);
>>  		prev = lblk + len;
>>  	}
>>  }
>> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
>> index 31dc0496f8d0..f9546ecf7340 100644
>> --- a/fs/ext4/extents_status.c
>> +++ b/fs/ext4/extents_status.c
>> @@ -986,13 +986,19 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>  }
>>  
>>  /*
>> - * ext4_es_cache_extent() inserts information into the extent status
>> - * tree if and only if there isn't information about the range in
>> - * question already.
>> + * ext4_es_cache_extent() inserts extent information into the extent status
>> + * tree. If 'overwrite' is not set, it inserts extent only if there isn't
>> + * information about the specified range. Otherwise, it overwrites the
>> + * current information.
>> + *
>> + * Note that this interface is only used for caching on-disk extent
>> + * information and cannot be used to convert existing extents in the extent
>> + * status tree. To convert existing extents, use ext4_es_insert_extent()
>> + * instead.
>>   */
>>  void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>>  			  ext4_lblk_t len, ext4_fsblk_t pblk,
>> -			  unsigned int status)
>> +			  unsigned int status, bool overwrite)
>>  {
>>  	struct extent_status *es;
>>  	struct extent_status newes;
>> @@ -1012,10 +1018,18 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>>  	BUG_ON(end < lblk);
>>  
>>  	write_lock(&EXT4_I(inode)->i_es_lock);
>> -
>>  	es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk);
>> -	if (!es || es->es_lblk > end)
>> -		__es_insert_extent(inode, &newes, NULL);
>> +	if (es && es->es_lblk <= end) {
>> +		if (!overwrite)
>> +			goto unlock;
>> +
>> +		/* Only extents of the same type can be overwritten. */
>> +		WARN_ON_ONCE(ext4_es_type(es) != status);
>> +		if (__es_remove_extent(inode, lblk, end, NULL, NULL))
>> +			goto unlock;
>> +	}
>> +	__es_insert_extent(inode, &newes, NULL);
>> +unlock:
>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>>  }
>>  
>> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
>> index 8f9c008d11e8..415f7c223a46 100644
>> --- a/fs/ext4/extents_status.h
>> +++ b/fs/ext4/extents_status.h
>> @@ -139,7 +139,7 @@ extern void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>  				  bool delalloc_reserve_used);
>>  extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>>  				 ext4_lblk_t len, ext4_fsblk_t pblk,
>> -				 unsigned int status);
>> +				 unsigned int status, bool overwrite);
>>  extern void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>>  				  ext4_lblk_t len);
>>  extern void ext4_es_find_extent_range(struct inode *inode,
>> -- 
>> 2.46.1
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ