[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cfd95673-d0e6-44e6-86af-04bf2e0a9a8f@huaweicloud.com>
Date: Wed, 19 Nov 2025 17:36:48 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
yi.zhang@...wei.com, libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite
existing extents
On 11/11/2025 6:33 PM, Jan Kara wrote:
> Hi!
>
> On Thu 06-11-25 21:02:35, Zhang Yi wrote:
>> On 11/6/2025 5:15 PM, Jan Kara wrote:
>>> On Fri 31-10-25 14:29:02, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@...wei.com>
>>>>
>>>> Currently, ext4_es_cache_extent() is used to load extents into the
>>>> extent status tree when reading on-disk extent blocks. Since it may be
>>>> called while moving or modifying the extent tree, so it does not
>>>> overwrite existing extents in the extent status tree and is only used
>>>> for the initial loading.
>>>>
>>>> There are many other places in ext4 where on-disk extents are inserted
>>>> into the extent status tree, such as in ext4_map_query_blocks().
>>>> Currently, they call ext4_es_insert_extent() to perform the insertion,
>>>> but they don't modify the extents, so ext4_es_cache_extent() would be a
>>>> more appropriate choice. However, when ext4_map_query_blocks() inserts
>>>> an extent, it may overwrite a short existing extent of the same type.
>>>> Therefore, to prepare for the replacements, we need to extend
>>>> ext4_es_cache_extent() to allow it to overwrite existing extents with
>>>> the same type.
>>>>
>>>> In addition, since cached extents can be more lenient than the extents
>>>> they modify and do not involve modifying reserved blocks, it is not
>>>> necessary to ensure that the insertion operation succeeds as strictly as
>>>> in the ext4_es_insert_extent() function.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>>>
>>> Thanks for writing this series! I think we can actually simplify things
>>> event further. Extent status tree operations can be divided into three
>>> groups:
>>> 1) Lookups in es tree - protected only by i_es_lock.
>>> 2) Caching of on-disk state into es tree - protected by i_es_lock and
>>> i_data_sem (at least in read mode).
>>> 3) Modification of existing state - protected by i_es_lock and i_data_sem
>>> in write mode.
>>
>> Yeah.
>>
>>>
>>> Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
>>> that 2) should never see a real conflict - i.e., all intersecting entries
>>> in es tree have the same status, otherwise this is a bug.
>>
>> While I was debugging, I observed two exceptions here.
>>
>> A. The first exceptions is about the delay extent. Since there is no actual
>> extent present in the extent tree on the disk, if a delayed extent
>> already exists in the extent status tree and someone calls
>> ext4_find_extent()->ext4_cache_extents() to cache an extent at the same
>> location, then a status mismatch will occur (attempting to replace
>> the delayed extent with a hole). This is not a bug.
>> B. I also observed that ext4_find_extent()->ext4_cache_extents() is called
>> during splitting and conversion between unwritten and written states (in
>> most scenarios, EXT4_EX_NOCACHE is not added). However, because the
>> process is in an intermediate state of handling extents, there can be
>> cases where the status do not match. I did not analyze this scenario in
>> detail, but since ext4_es_insert_extent() is called at the end of the
>> processing to ensure the final state is correct, I don't think this is a
>> practical issue either.
>
> Thanks for bringing this up. I didn't think about these two cases. As for
> case A that is easy to deal with as you write below. A hole insertion can
> be deemed compatible with existing delalloc extent.
>
Yeah.
> Case B is more difficult and I think I need to better understand the
> details there to decide what to do. Only extent splitting (as it happens
> e.g. with EXT4_GET_BLOCKS_PRE_IO) should keep extents in the extent tree and
> extent status tree compatible. So it has to be something like
> EXT4_GET_BLOCKS_CONVERT case. There indeed after we call
> ext4_ext_mark_initialized() we have initialized extent on disk but in
> extent status tree it is still as unwritten. But I just didn't find a place
> in the extent conversion path that would modify extent state on disk and
> then call ext4_find_extent(). Can you perhaps share a stacktrace where the
> extent incompatibility was hit from ext4_cache_extents()? Thanks!
>
> Honza
>
Sorry for the late. I have found several real issues during debugging this
case, the situation is a bit complicated and will take some time, I will
address these in the next iteration.
Cheers,
Yi.
Powered by blists - more mailing lists