linux-kernel - Re: [PATCH 5/7] ext4: Refactor zeroout path and handle all cases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV-6Xa56iQaH5ltX@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Thu, 8 Jan 2026 19:38:29 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
        Ritesh Harjani <ritesh.list@...il.com>, Jan Kara <jack@...e.cz>,
        libaokun1@...wei.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/7] ext4: Refactor zeroout path and handle all cases

On Thu, Jan 08, 2026 at 08:54:04PM +0800, Zhang Yi wrote:
> On 1/8/2026 8:42 PM, Ojaswin Mujoo wrote:
> > On Thu, Jan 08, 2026 at 07:58:21PM +0800, Zhang Yi wrote:
> >> On 1/4/2026 8:19 PM, Ojaswin Mujoo wrote:
> >>> Currently, zeroout is used as a fallback in case we fail to
> >>> split/convert extents in the "traditional" modify-the-extent-tree way.
> >>> This is essential to mitigate failures in critical paths like extent
> >>> splitting during endio. However, the logic is very messy and not easy to
> >>> follow. Further, the fragile use of various flags has made it prone to
> >>> errors.
> >>>
> >>> Refactor zeroout out logic by moving it up to ext4_split_extents().
> >>> Further, zeroout correctly based on the type of conversion we want, ie:
> >>> - unwritten to written: Zeroout everything around the mapped range.
> >>> - unwritten to unwritten: Zeroout everything
> >>> - written to unwritten: Zeroout only the mapped range.
> >>>
> >>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> >>
> >> Hi, Ojaswin!
> >>
> >> The refactor overall looks good to me. After this series, the split
> >> logic becomes more straightforward and clear. :)
> >>
> >> I have some comments below.
> >>
> >>> ---
> >>>  fs/ext4/extents.c | 287 +++++++++++++++++++++++++++++++---------------
> >>>  1 file changed, 195 insertions(+), 92 deletions(-)
> >>>
> >>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >>> index 460a70e6dae0..8082e1d93bbf 100644
> >>> --- a/fs/ext4/extents.c
> >>> +++ b/fs/ext4/extents.c
> >>
> >> [...]
> >>
> >>> @@ -3365,6 +3313,115 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
> >>>  	return path;
> >>>  }
> >>>  
> >>> +static struct ext4_ext_path *
> >>> +ext4_split_extent_zeroout(handle_t *handle, struct inode *inode,
> >>> +			  struct ext4_ext_path *path,
> >>> +			  struct ext4_map_blocks *map, int flags)
> >>> +{
> >>> +	struct ext4_extent *ex;
> >>> +	unsigned int ee_len, depth;
> >>> +	ext4_lblk_t ee_block;
> >>> +	uint64_t lblk, pblk, len;
> >>> +	int is_unwrit;
> >>> +	int err = 0;
> >>> +
> >>> +	depth = ext_depth(inode);
> >>> +	ex = path[depth].p_ext;
> >>> +	ee_block = le32_to_cpu(ex->ee_block);
> >>> +	ee_len = ext4_ext_get_actual_len(ex);
> >>> +	is_unwrit = ext4_ext_is_unwritten(ex);
> >>> +
> >>> +	if (flags & EXT4_GET_BLOCKS_CONVERT) {
> >>> +		/*
> >>> +		 * EXT4_GET_BLOCKS_CONVERT: Caller wants the range specified by
> >>> +		 * map to be initialized. Zeroout everything except the map
> >>> +		 * range.
> >>> +		 */
> >>> +
> >>> +		loff_t map_end = (loff_t) map->m_lblk + map->m_len;
> >>> +		loff_t ex_end = (loff_t) ee_block + ee_len;
> >>> +
> >>> +		if (!is_unwrit)
> >>> +			/* Shouldn't happen. Just exit */
> >>> +			return ERR_PTR(-EINVAL);
> >>
> >> For cases that are should not happen, I'd suggest adding a WARN_ON_ONCE or
> >> a message to facilitate future problem identification. Same as below.
> > 
> > Hi Yi,
> > 
> > Thanks for the review! Sure I can do that in v2.
> >>
> >>> +
> >>> +		/* zeroout left */
> >>> +		if (map->m_lblk > ee_block) {
> >>> +			lblk = ee_block;
> >>> +			len = map->m_lblk - ee_block;
> >>> +			pblk = ext4_ext_pblock(ex);
> >>> +			err = ext4_issue_zeroout(inode, lblk, pblk, len);
> >>> +			if (err)
> >>> +				/* ZEROOUT failed, just return original error */
> >>> +				return ERR_PTR(err);
> >>> +		}
> >>> +
> >>> +		/* zeroout right */
> >>> +		if (map->m_lblk + map->m_len < ee_block + ee_len) {
> >>> +			lblk = map_end;
> >>> +			len = ex_end - map_end;
> >>> +			pblk = ext4_ext_pblock(ex) + (map_end - ee_block);
> >>> +			err = ext4_issue_zeroout(inode, lblk, pblk, len);
> >>> +			if (err)
> >>> +				/* ZEROOUT failed, just return original error */
> >>> +				return ERR_PTR(err);
> >>> +		}
> >>> +	} else if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) {
> >>> +		/*
> >>> +		 * EXT4_GET_BLOCKS_CONVERT_UNWRITTEN: Caller wants the
> >>> +		 * range specified by map to be marked unwritten.
> >>> +		 * Zeroout the map range leaving rest as it is.
> >>> +		 */
> >>> +
> >>> +		if (is_unwrit)
> >>> +			/* Shouldn't happen. Just exit */
> >>> +			return ERR_PTR(-EINVAL);
> >>> +
> >>> +		lblk = map->m_lblk;
> >>> +		len = map->m_len;
> >>> +		pblk = ext4_ext_pblock(ex) + (map->m_lblk - ee_block);
> >>> +		err = ext4_issue_zeroout(inode, lblk, pblk, len);
> >>> +		if (err)
> >>> +			/* ZEROOUT failed, just return original error */
> >>> +			return ERR_PTR(err);
> >>> +	} else if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT) {
> >>> +		/*
> >>> +		 * EXT4_GET_BLOCKS_UNWRIT_EXT: Today, this flag
> >>> +		 * implicitly implies that callers when wanting an
> >>> +		 * unwritten to unwritten split. So zeroout the whole
> >>> +		 * extent.
> >>> +		 *
> >>> +		 * TODO: The implicit meaning of the flag is not ideal
> >>> +		 * and eventually we should aim for a more well defined
> >>> +		 * behavior
> >>> +		 */
> >>> +
> >>
> >> I don't think we need this branch anymore. After applying my patch "ext4:
> >> don't split extent before submitting I/O", we will no longer encounter
> >> situations where doing an unwritten to unwritten split. It means that at
> >> all call sites of ext4_split_extent(), only EXT4_GET_BLOCKS_CONVERT or
> >> EXT4_GET_BLOCKS_CONVERT_UNWRITTEN flags are passed. What do you think?
> > 
> > Yes, I did notice that as well after rebasing on your changes. 
> > 
> > So the next patch enforces the behavior that if no flag is passed to
> > ext4_split_extent() -> ext4_split_extent_zeroout() then we assume a
> > split without conversion. As you mentioned, there is no remaining caller
> > that does this but I thought of handling it here so that in future if we
> > ever need to use unwrit to unwrit splits we handle it correctly.
> > 
> > Incase you still feel this makes it confusing or is uneccessary I can
> > remove the else part altoghether and add a WARN_ON.
> > 
> 
> Yes, my personal suggestion is to add this part of the logic only when it
> is really needed. :)

Okay sounds good, will take this out in v2.

Regards,
ojaswin

> 
> Cheers,
> Yi.
> 
> > Thanks,
> > ojaswin
> > 
> >>
> >> Thanks,
> >> Yi.
> >>
> >>> +		if (!is_unwrit)
> >>> +			/* Shouldn't happen. Just exit */
> >>> +			return ERR_PTR(-EINVAL);
> >>> +
> >>> +		lblk = ee_block;
> >>> +		len = ee_len;
> >>> +		pblk = ext4_ext_pblock(ex);
> >>> +		err = ext4_issue_zeroout(inode, lblk, pblk, len);
> >>> +		if (err)
> >>> +			/* ZEROOUT failed, just return original error */
> >>> +			return ERR_PTR(err);
> >>> +	}
> >>> +
> >>> +	err = ext4_ext_get_access(handle, inode, path + depth);
> >>> +	if (err)
> >>> +		return ERR_PTR(err);
> >>> +
> >>> +	ext4_ext_mark_initialized(ex);
> >>> +
> >>> +	ext4_ext_dirty(handle, inode, path + path->p_depth);
> >>> +	if (err)
> >>> +		return ERR_PTR(err);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  /*
> >>>   * ext4_split_extent() splits an extent and mark extent which is covered
> >>>   * by @map as split_flags indicates
> >>
> >> [...]
> >>
>