lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <l7tb75bsk52ybeok737b7o4ag4zeleowtddf3v6wcbnhbom4tx@xv643wp5wp6a>
Date: Thu, 6 Nov 2025 10:15:38 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz, 
	yi.zhang@...wei.com, libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite
 existing extents

On Fri 31-10-25 14:29:02, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> Currently, ext4_es_cache_extent() is used to load extents into the
> extent status tree when reading on-disk extent blocks. Since it may be
> called while moving or modifying the extent tree, so it does not
> overwrite existing extents in the extent status tree and is only used
> for the initial loading.
> 
> There are many other places in ext4 where on-disk extents are inserted
> into the extent status tree, such as in ext4_map_query_blocks().
> Currently, they call ext4_es_insert_extent() to perform the insertion,
> but they don't modify the extents, so ext4_es_cache_extent() would be a
> more appropriate choice. However, when ext4_map_query_blocks() inserts
> an extent, it may overwrite a short existing extent of the same type.
> Therefore, to prepare for the replacements, we need to extend
> ext4_es_cache_extent() to allow it to overwrite existing extents with
> the same type.
> 
> In addition, since cached extents can be more lenient than the extents
> they modify and do not involve modifying reserved blocks, it is not
> necessary to ensure that the insertion operation succeeds as strictly as
> in the ext4_es_insert_extent() function.
> 
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>

Thanks for writing this series! I think we can actually simplify things
event further. Extent status tree operations can be divided into three
groups:
1) Lookups in es tree - protected only by i_es_lock.
2) Caching of on-disk state into es tree - protected by i_es_lock and
   i_data_sem (at least in read mode).
3) Modification of existing state - protected by i_es_lock and i_data_sem
   in write mode.

Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
that 2) should never see a real conflict - i.e., all intersecting entries
in es tree have the same status, otherwise this is a bug. So I think that 
ext4_es_cache_extent() should always walk the whole inserted range, verify
the statuses match and merge all these entries into a single one. This
isn't going to be slower than what we have today because your
__es_remove_extent(), __es_insert_extent() pair is effectively doing the
same thing, just without checking the statuses. That way we always get the
checking and also ext4_es_cache_extent() doesn't have to have the
overwriting and non-overwriting variant. What do you think?

								Honza

> ---
>  fs/ext4/extents.c        |  4 ++--
>  fs/ext4/extents_status.c | 28 +++++++++++++++++++++-------
>  fs/ext4/extents_status.h |  2 +-
>  3 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ca5499e9412b..c42ceb5aae37 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -537,12 +537,12 @@ static void ext4_cache_extents(struct inode *inode,
>  
>  		if (prev && (prev != lblk))
>  			ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
> -					     EXTENT_STATUS_HOLE);
> +					     EXTENT_STATUS_HOLE, false);
>  
>  		if (ext4_ext_is_unwritten(ex))
>  			status = EXTENT_STATUS_UNWRITTEN;
>  		ext4_es_cache_extent(inode, lblk, len,
> -				     ext4_ext_pblock(ex), status);
> +				     ext4_ext_pblock(ex), status, false);
>  		prev = lblk + len;
>  	}
>  }
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index 31dc0496f8d0..f9546ecf7340 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -986,13 +986,19 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>  }
>  
>  /*
> - * ext4_es_cache_extent() inserts information into the extent status
> - * tree if and only if there isn't information about the range in
> - * question already.
> + * ext4_es_cache_extent() inserts extent information into the extent status
> + * tree. If 'overwrite' is not set, it inserts extent only if there isn't
> + * information about the specified range. Otherwise, it overwrites the
> + * current information.
> + *
> + * Note that this interface is only used for caching on-disk extent
> + * information and cannot be used to convert existing extents in the extent
> + * status tree. To convert existing extents, use ext4_es_insert_extent()
> + * instead.
>   */
>  void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>  			  ext4_lblk_t len, ext4_fsblk_t pblk,
> -			  unsigned int status)
> +			  unsigned int status, bool overwrite)
>  {
>  	struct extent_status *es;
>  	struct extent_status newes;
> @@ -1012,10 +1018,18 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>  	BUG_ON(end < lblk);
>  
>  	write_lock(&EXT4_I(inode)->i_es_lock);
> -
>  	es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk);
> -	if (!es || es->es_lblk > end)
> -		__es_insert_extent(inode, &newes, NULL);
> +	if (es && es->es_lblk <= end) {
> +		if (!overwrite)
> +			goto unlock;
> +
> +		/* Only extents of the same type can be overwritten. */
> +		WARN_ON_ONCE(ext4_es_type(es) != status);
> +		if (__es_remove_extent(inode, lblk, end, NULL, NULL))
> +			goto unlock;
> +	}
> +	__es_insert_extent(inode, &newes, NULL);
> +unlock:
>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>  }
>  
> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> index 8f9c008d11e8..415f7c223a46 100644
> --- a/fs/ext4/extents_status.h
> +++ b/fs/ext4/extents_status.h
> @@ -139,7 +139,7 @@ extern void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>  				  bool delalloc_reserve_used);
>  extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>  				 ext4_lblk_t len, ext4_fsblk_t pblk,
> -				 unsigned int status);
> +				 unsigned int status, bool overwrite);
>  extern void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>  				  ext4_lblk_t len);
>  extern void ext4_es_find_extent_range(struct inode *inode,
> -- 
> 2.46.1
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ