[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae308d42-fd15-45e6-9cf8-fb3c8f3d0671@huaweicloud.com>
Date: Thu, 8 Jan 2026 20:54:04 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Ritesh Harjani <ritesh.list@...il.com>, Jan Kara <jack@...e.cz>,
libaokun1@...wei.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/7] ext4: Refactor zeroout path and handle all cases
On 1/8/2026 8:42 PM, Ojaswin Mujoo wrote:
> On Thu, Jan 08, 2026 at 07:58:21PM +0800, Zhang Yi wrote:
>> On 1/4/2026 8:19 PM, Ojaswin Mujoo wrote:
>>> Currently, zeroout is used as a fallback in case we fail to
>>> split/convert extents in the "traditional" modify-the-extent-tree way.
>>> This is essential to mitigate failures in critical paths like extent
>>> splitting during endio. However, the logic is very messy and not easy to
>>> follow. Further, the fragile use of various flags has made it prone to
>>> errors.
>>>
>>> Refactor zeroout out logic by moving it up to ext4_split_extents().
>>> Further, zeroout correctly based on the type of conversion we want, ie:
>>> - unwritten to written: Zeroout everything around the mapped range.
>>> - unwritten to unwritten: Zeroout everything
>>> - written to unwritten: Zeroout only the mapped range.
>>>
>>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>>
>> Hi, Ojaswin!
>>
>> The refactor overall looks good to me. After this series, the split
>> logic becomes more straightforward and clear. :)
>>
>> I have some comments below.
>>
>>> ---
>>> fs/ext4/extents.c | 287 +++++++++++++++++++++++++++++++---------------
>>> 1 file changed, 195 insertions(+), 92 deletions(-)
>>>
>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>> index 460a70e6dae0..8082e1d93bbf 100644
>>> --- a/fs/ext4/extents.c
>>> +++ b/fs/ext4/extents.c
>>
>> [...]
>>
>>> @@ -3365,6 +3313,115 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
>>> return path;
>>> }
>>>
>>> +static struct ext4_ext_path *
>>> +ext4_split_extent_zeroout(handle_t *handle, struct inode *inode,
>>> + struct ext4_ext_path *path,
>>> + struct ext4_map_blocks *map, int flags)
>>> +{
>>> + struct ext4_extent *ex;
>>> + unsigned int ee_len, depth;
>>> + ext4_lblk_t ee_block;
>>> + uint64_t lblk, pblk, len;
>>> + int is_unwrit;
>>> + int err = 0;
>>> +
>>> + depth = ext_depth(inode);
>>> + ex = path[depth].p_ext;
>>> + ee_block = le32_to_cpu(ex->ee_block);
>>> + ee_len = ext4_ext_get_actual_len(ex);
>>> + is_unwrit = ext4_ext_is_unwritten(ex);
>>> +
>>> + if (flags & EXT4_GET_BLOCKS_CONVERT) {
>>> + /*
>>> + * EXT4_GET_BLOCKS_CONVERT: Caller wants the range specified by
>>> + * map to be initialized. Zeroout everything except the map
>>> + * range.
>>> + */
>>> +
>>> + loff_t map_end = (loff_t) map->m_lblk + map->m_len;
>>> + loff_t ex_end = (loff_t) ee_block + ee_len;
>>> +
>>> + if (!is_unwrit)
>>> + /* Shouldn't happen. Just exit */
>>> + return ERR_PTR(-EINVAL);
>>
>> For cases that are should not happen, I'd suggest adding a WARN_ON_ONCE or
>> a message to facilitate future problem identification. Same as below.
>
> Hi Yi,
>
> Thanks for the review! Sure I can do that in v2.
>>
>>> +
>>> + /* zeroout left */
>>> + if (map->m_lblk > ee_block) {
>>> + lblk = ee_block;
>>> + len = map->m_lblk - ee_block;
>>> + pblk = ext4_ext_pblock(ex);
>>> + err = ext4_issue_zeroout(inode, lblk, pblk, len);
>>> + if (err)
>>> + /* ZEROOUT failed, just return original error */
>>> + return ERR_PTR(err);
>>> + }
>>> +
>>> + /* zeroout right */
>>> + if (map->m_lblk + map->m_len < ee_block + ee_len) {
>>> + lblk = map_end;
>>> + len = ex_end - map_end;
>>> + pblk = ext4_ext_pblock(ex) + (map_end - ee_block);
>>> + err = ext4_issue_zeroout(inode, lblk, pblk, len);
>>> + if (err)
>>> + /* ZEROOUT failed, just return original error */
>>> + return ERR_PTR(err);
>>> + }
>>> + } else if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) {
>>> + /*
>>> + * EXT4_GET_BLOCKS_CONVERT_UNWRITTEN: Caller wants the
>>> + * range specified by map to be marked unwritten.
>>> + * Zeroout the map range leaving rest as it is.
>>> + */
>>> +
>>> + if (is_unwrit)
>>> + /* Shouldn't happen. Just exit */
>>> + return ERR_PTR(-EINVAL);
>>> +
>>> + lblk = map->m_lblk;
>>> + len = map->m_len;
>>> + pblk = ext4_ext_pblock(ex) + (map->m_lblk - ee_block);
>>> + err = ext4_issue_zeroout(inode, lblk, pblk, len);
>>> + if (err)
>>> + /* ZEROOUT failed, just return original error */
>>> + return ERR_PTR(err);
>>> + } else if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT) {
>>> + /*
>>> + * EXT4_GET_BLOCKS_UNWRIT_EXT: Today, this flag
>>> + * implicitly implies that callers when wanting an
>>> + * unwritten to unwritten split. So zeroout the whole
>>> + * extent.
>>> + *
>>> + * TODO: The implicit meaning of the flag is not ideal
>>> + * and eventually we should aim for a more well defined
>>> + * behavior
>>> + */
>>> +
>>
>> I don't think we need this branch anymore. After applying my patch "ext4:
>> don't split extent before submitting I/O", we will no longer encounter
>> situations where doing an unwritten to unwritten split. It means that at
>> all call sites of ext4_split_extent(), only EXT4_GET_BLOCKS_CONVERT or
>> EXT4_GET_BLOCKS_CONVERT_UNWRITTEN flags are passed. What do you think?
>
> Yes, I did notice that as well after rebasing on your changes.
>
> So the next patch enforces the behavior that if no flag is passed to
> ext4_split_extent() -> ext4_split_extent_zeroout() then we assume a
> split without conversion. As you mentioned, there is no remaining caller
> that does this but I thought of handling it here so that in future if we
> ever need to use unwrit to unwrit splits we handle it correctly.
>
> Incase you still feel this makes it confusing or is uneccessary I can
> remove the else part altoghether and add a WARN_ON.
>
Yes, my personal suggestion is to add this part of the logic only when it
is really needed. :)
Cheers,
Yi.
> Thanks,
> ojaswin
>
>>
>> Thanks,
>> Yi.
>>
>>> + if (!is_unwrit)
>>> + /* Shouldn't happen. Just exit */
>>> + return ERR_PTR(-EINVAL);
>>> +
>>> + lblk = ee_block;
>>> + len = ee_len;
>>> + pblk = ext4_ext_pblock(ex);
>>> + err = ext4_issue_zeroout(inode, lblk, pblk, len);
>>> + if (err)
>>> + /* ZEROOUT failed, just return original error */
>>> + return ERR_PTR(err);
>>> + }
>>> +
>>> + err = ext4_ext_get_access(handle, inode, path + depth);
>>> + if (err)
>>> + return ERR_PTR(err);
>>> +
>>> + ext4_ext_mark_initialized(ex);
>>> +
>>> + ext4_ext_dirty(handle, inode, path + path->p_depth);
>>> + if (err)
>>> + return ERR_PTR(err);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> /*
>>> * ext4_split_extent() splits an extent and mark extent which is covered
>>> * by @map as split_flags indicates
>>
>> [...]
>>
Powered by blists - more mailing lists