linux-kernel - Re: [PATCH 5/7] ext4: Refactor zeroout path and handle all cases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV-mJci2dxojEFMY@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Thu, 8 Jan 2026 18:12:49 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
        Ritesh Harjani <ritesh.list@...il.com>, Jan Kara <jack@...e.cz>,
        libaokun1@...wei.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/7] ext4: Refactor zeroout path and handle all cases

On Thu, Jan 08, 2026 at 07:58:21PM +0800, Zhang Yi wrote:
> On 1/4/2026 8:19 PM, Ojaswin Mujoo wrote:
> > Currently, zeroout is used as a fallback in case we fail to
> > split/convert extents in the "traditional" modify-the-extent-tree way.
> > This is essential to mitigate failures in critical paths like extent
> > splitting during endio. However, the logic is very messy and not easy to
> > follow. Further, the fragile use of various flags has made it prone to
> > errors.
> > 
> > Refactor zeroout out logic by moving it up to ext4_split_extents().
> > Further, zeroout correctly based on the type of conversion we want, ie:
> > - unwritten to written: Zeroout everything around the mapped range.
> > - unwritten to unwritten: Zeroout everything
> > - written to unwritten: Zeroout only the mapped range.
> > 
> > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> 
> Hi, Ojaswin!
> 
> The refactor overall looks good to me. After this series, the split
> logic becomes more straightforward and clear. :)
> 
> I have some comments below.
> 
> > ---
> >  fs/ext4/extents.c | 287 +++++++++++++++++++++++++++++++---------------
> >  1 file changed, 195 insertions(+), 92 deletions(-)
> > 
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 460a70e6dae0..8082e1d93bbf 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> 
> [...]
> 
> > @@ -3365,6 +3313,115 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
> >  	return path;
> >  }
> >  
> > +static struct ext4_ext_path *
> > +ext4_split_extent_zeroout(handle_t *handle, struct inode *inode,
> > +			  struct ext4_ext_path *path,
> > +			  struct ext4_map_blocks *map, int flags)
> > +{
> > +	struct ext4_extent *ex;
> > +	unsigned int ee_len, depth;
> > +	ext4_lblk_t ee_block;
> > +	uint64_t lblk, pblk, len;
> > +	int is_unwrit;
> > +	int err = 0;
> > +
> > +	depth = ext_depth(inode);
> > +	ex = path[depth].p_ext;
> > +	ee_block = le32_to_cpu(ex->ee_block);
> > +	ee_len = ext4_ext_get_actual_len(ex);
> > +	is_unwrit = ext4_ext_is_unwritten(ex);
> > +
> > +	if (flags & EXT4_GET_BLOCKS_CONVERT) {
> > +		/*
> > +		 * EXT4_GET_BLOCKS_CONVERT: Caller wants the range specified by
> > +		 * map to be initialized. Zeroout everything except the map
> > +		 * range.
> > +		 */
> > +
> > +		loff_t map_end = (loff_t) map->m_lblk + map->m_len;
> > +		loff_t ex_end = (loff_t) ee_block + ee_len;
> > +
> > +		if (!is_unwrit)
> > +			/* Shouldn't happen. Just exit */
> > +			return ERR_PTR(-EINVAL);
> 
> For cases that are should not happen, I'd suggest adding a WARN_ON_ONCE or
> a message to facilitate future problem identification. Same as below.

Hi Yi,

Thanks for the review! Sure I can do that in v2.
> 
> > +
> > +		/* zeroout left */
> > +		if (map->m_lblk > ee_block) {
> > +			lblk = ee_block;
> > +			len = map->m_lblk - ee_block;
> > +			pblk = ext4_ext_pblock(ex);
> > +			err = ext4_issue_zeroout(inode, lblk, pblk, len);
> > +			if (err)
> > +				/* ZEROOUT failed, just return original error */
> > +				return ERR_PTR(err);
> > +		}
> > +
> > +		/* zeroout right */
> > +		if (map->m_lblk + map->m_len < ee_block + ee_len) {
> > +			lblk = map_end;
> > +			len = ex_end - map_end;
> > +			pblk = ext4_ext_pblock(ex) + (map_end - ee_block);
> > +			err = ext4_issue_zeroout(inode, lblk, pblk, len);
> > +			if (err)
> > +				/* ZEROOUT failed, just return original error */
> > +				return ERR_PTR(err);
> > +		}
> > +	} else if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) {
> > +		/*
> > +		 * EXT4_GET_BLOCKS_CONVERT_UNWRITTEN: Caller wants the
> > +		 * range specified by map to be marked unwritten.
> > +		 * Zeroout the map range leaving rest as it is.
> > +		 */
> > +
> > +		if (is_unwrit)
> > +			/* Shouldn't happen. Just exit */
> > +			return ERR_PTR(-EINVAL);
> > +
> > +		lblk = map->m_lblk;
> > +		len = map->m_len;
> > +		pblk = ext4_ext_pblock(ex) + (map->m_lblk - ee_block);
> > +		err = ext4_issue_zeroout(inode, lblk, pblk, len);
> > +		if (err)
> > +			/* ZEROOUT failed, just return original error */
> > +			return ERR_PTR(err);
> > +	} else if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT) {
> > +		/*
> > +		 * EXT4_GET_BLOCKS_UNWRIT_EXT: Today, this flag
> > +		 * implicitly implies that callers when wanting an
> > +		 * unwritten to unwritten split. So zeroout the whole
> > +		 * extent.
> > +		 *
> > +		 * TODO: The implicit meaning of the flag is not ideal
> > +		 * and eventually we should aim for a more well defined
> > +		 * behavior
> > +		 */
> > +
> 
> I don't think we need this branch anymore. After applying my patch "ext4:
> don't split extent before submitting I/O", we will no longer encounter
> situations where doing an unwritten to unwritten split. It means that at
> all call sites of ext4_split_extent(), only EXT4_GET_BLOCKS_CONVERT or
> EXT4_GET_BLOCKS_CONVERT_UNWRITTEN flags are passed. What do you think?

Yes, I did notice that as well after rebasing on your changes. 

So the next patch enforces the behavior that if no flag is passed to
ext4_split_extent() -> ext4_split_extent_zeroout() then we assume a
split without conversion. As you mentioned, there is no remaining caller
that does this but I thought of handling it here so that in future if we
ever need to use unwrit to unwrit splits we handle it correctly.

Incase you still feel this makes it confusing or is uneccessary I can
remove the else part altoghether and add a WARN_ON.

Thanks,
ojaswin

> 
> Thanks,
> Yi.
> 
> > +		if (!is_unwrit)
> > +			/* Shouldn't happen. Just exit */
> > +			return ERR_PTR(-EINVAL);
> > +
> > +		lblk = ee_block;
> > +		len = ee_len;
> > +		pblk = ext4_ext_pblock(ex);
> > +		err = ext4_issue_zeroout(inode, lblk, pblk, len);
> > +		if (err)
> > +			/* ZEROOUT failed, just return original error */
> > +			return ERR_PTR(err);
> > +	}
> > +
> > +	err = ext4_ext_get_access(handle, inode, path + depth);
> > +	if (err)
> > +		return ERR_PTR(err);
> > +
> > +	ext4_ext_mark_initialized(ex);
> > +
> > +	ext4_ext_dirty(handle, inode, path + path->p_depth);
> > +	if (err)
> > +		return ERR_PTR(err);
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * ext4_split_extent() splits an extent and mark extent which is covered
> >   * by @map as split_flags indicates
> 
> [...]
>