[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ee200d75-6f3e-4514-8fd4-8cdcbd3754d4@huaweicloud.com>
Date: Thu, 6 Nov 2025 21:02:35 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
yi.zhang@...wei.com, libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite
existing extents
Hi! Thank you for the review and suggestions!
On 11/6/2025 5:15 PM, Jan Kara wrote:
> On Fri 31-10-25 14:29:02, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> Currently, ext4_es_cache_extent() is used to load extents into the
>> extent status tree when reading on-disk extent blocks. Since it may be
>> called while moving or modifying the extent tree, so it does not
>> overwrite existing extents in the extent status tree and is only used
>> for the initial loading.
>>
>> There are many other places in ext4 where on-disk extents are inserted
>> into the extent status tree, such as in ext4_map_query_blocks().
>> Currently, they call ext4_es_insert_extent() to perform the insertion,
>> but they don't modify the extents, so ext4_es_cache_extent() would be a
>> more appropriate choice. However, when ext4_map_query_blocks() inserts
>> an extent, it may overwrite a short existing extent of the same type.
>> Therefore, to prepare for the replacements, we need to extend
>> ext4_es_cache_extent() to allow it to overwrite existing extents with
>> the same type.
>>
>> In addition, since cached extents can be more lenient than the extents
>> they modify and do not involve modifying reserved blocks, it is not
>> necessary to ensure that the insertion operation succeeds as strictly as
>> in the ext4_es_insert_extent() function.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>
> Thanks for writing this series! I think we can actually simplify things
> event further. Extent status tree operations can be divided into three
> groups:
> 1) Lookups in es tree - protected only by i_es_lock.
> 2) Caching of on-disk state into es tree - protected by i_es_lock and
> i_data_sem (at least in read mode).
> 3) Modification of existing state - protected by i_es_lock and i_data_sem
> in write mode.
Yeah.
>
> Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
> that 2) should never see a real conflict - i.e., all intersecting entries
> in es tree have the same status, otherwise this is a bug.
While I was debugging, I observed two exceptions here.
A. The first exceptions is about the delay extent. Since there is no actual
extent present in the extent tree on the disk, if a delayed extent
already exists in the extent status tree and someone calls
ext4_find_extent()->ext4_cache_extents() to cache an extent at the same
location, then a status mismatch will occur (attempting to replace
the delayed extent with a hole). This is not a bug.
B. I also observed that ext4_find_extent()->ext4_cache_extents() is called
during splitting and conversion between unwritten and written states (in
most scenarios, EXT4_EX_NOCACHE is not added). However, because the
process is in an intermediate state of handling extents, there can be
cases where the status do not match. I did not analyze this scenario in
detail, but since ext4_es_insert_extent() is called at the end of the
processing to ensure the final state is correct, I don't think this is a
practical issue either.
Therefore, I believe that retaining non-overwriting is necessary for this
scenario involving ext4_cache_extents() because it will be called during
case 3). Except for ext4_cache_extents(), other scenarios theoretically
should not be involved.
> So I think that
> ext4_es_cache_extent() should always walk the whole inserted range, verify
> the statuses match and merge all these entries into a single one. This
> isn't going to be slower than what we have today because your
> __es_remove_extent(), __es_insert_extent() pair is effectively doing the
> same thing, just without checking the statuses.
Yes, I agree that we can delegate the verification work in
ext4_es_cache_extent() to __es_remove_extent(). During the process of
overwriting extents, the first step is to remove the existing extents. If
the extent status does not match, an alarm will be triggered.
> That way we always get the
> checking and also ext4_es_cache_extent() doesn't have to have the
> overwriting and non-overwriting variant. What do you think?
>
> Honza
For case A, we can add an exception during verification and skip the
warnings. For case B, We need to ensure that ext4_cache_extents() is not
allowed to be called during the intermediate processing of the extent
tree. This seems feasible in theory, but I guess it is somewhat fragile.
So, keep the non-overwriting mode?
Best Regards,
Yi.
>
>> ---
>> fs/ext4/extents.c | 4 ++--
>> fs/ext4/extents_status.c | 28 +++++++++++++++++++++-------
>> fs/ext4/extents_status.h | 2 +-
>> 3 files changed, 24 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index ca5499e9412b..c42ceb5aae37 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -537,12 +537,12 @@ static void ext4_cache_extents(struct inode *inode,
>>
>> if (prev && (prev != lblk))
>> ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
>> - EXTENT_STATUS_HOLE);
>> + EXTENT_STATUS_HOLE, false);
>>
>> if (ext4_ext_is_unwritten(ex))
>> status = EXTENT_STATUS_UNWRITTEN;
>> ext4_es_cache_extent(inode, lblk, len,
>> - ext4_ext_pblock(ex), status);
>> + ext4_ext_pblock(ex), status, false);
>> prev = lblk + len;
>> }
>> }
>> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
>> index 31dc0496f8d0..f9546ecf7340 100644
>> --- a/fs/ext4/extents_status.c
>> +++ b/fs/ext4/extents_status.c
>> @@ -986,13 +986,19 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> }
>>
>> /*
>> - * ext4_es_cache_extent() inserts information into the extent status
>> - * tree if and only if there isn't information about the range in
>> - * question already.
>> + * ext4_es_cache_extent() inserts extent information into the extent status
>> + * tree. If 'overwrite' is not set, it inserts extent only if there isn't
>> + * information about the specified range. Otherwise, it overwrites the
>> + * current information.
>> + *
>> + * Note that this interface is only used for caching on-disk extent
>> + * information and cannot be used to convert existing extents in the extent
>> + * status tree. To convert existing extents, use ext4_es_insert_extent()
>> + * instead.
>> */
>> void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len, ext4_fsblk_t pblk,
>> - unsigned int status)
>> + unsigned int status, bool overwrite)
>> {
>> struct extent_status *es;
>> struct extent_status newes;
>> @@ -1012,10 +1018,18 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>> BUG_ON(end < lblk);
>>
>> write_lock(&EXT4_I(inode)->i_es_lock);
>> -
>> es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk);
>> - if (!es || es->es_lblk > end)
>> - __es_insert_extent(inode, &newes, NULL);
>> + if (es && es->es_lblk <= end) {
>> + if (!overwrite)
>> + goto unlock;
>> +
>> + /* Only extents of the same type can be overwritten. */
>> + WARN_ON_ONCE(ext4_es_type(es) != status);
>> + if (__es_remove_extent(inode, lblk, end, NULL, NULL))
>> + goto unlock;
>> + }
>> + __es_insert_extent(inode, &newes, NULL);
>> +unlock:
>> write_unlock(&EXT4_I(inode)->i_es_lock);
>> }
>>
>> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
>> index 8f9c008d11e8..415f7c223a46 100644
>> --- a/fs/ext4/extents_status.h
>> +++ b/fs/ext4/extents_status.h
>> @@ -139,7 +139,7 @@ extern void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> bool delalloc_reserve_used);
>> extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len, ext4_fsblk_t pblk,
>> - unsigned int status);
>> + unsigned int status, bool overwrite);
>> extern void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len);
>> extern void ext4_es_find_extent_range(struct inode *inode,
>> --
>> 2.46.1
>>
Powered by blists - more mailing lists