[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV4IL1wP76uefmO7@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Wed, 7 Jan 2026 12:45:59 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
Ritesh Harjani <ritesh.list@...il.com>, Zhang Yi <yi.zhang@...wei.com>,
libaokun1@...wei.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/7] ext4: Refactor zeroout path and handle all cases
On Tue, Jan 06, 2026 at 04:31:23PM +0100, Jan Kara wrote:
> On Sun 04-01-26 17:49:18, Ojaswin Mujoo wrote:
> > Currently, zeroout is used as a fallback in case we fail to
> > split/convert extents in the "traditional" modify-the-extent-tree way.
> > This is essential to mitigate failures in critical paths like extent
> > splitting during endio. However, the logic is very messy and not easy to
> > follow. Further, the fragile use of various flags has made it prone to
> > errors.
> >
> > Refactor zeroout out logic by moving it up to ext4_split_extents().
> > Further, zeroout correctly based on the type of conversion we want, ie:
> > - unwritten to written: Zeroout everything around the mapped range.
> > - unwritten to unwritten: Zeroout everything
> > - written to unwritten: Zeroout only the mapped range.
> >
> > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>
> ...
>
> > @@ -3383,11 +3440,12 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
> > int split_flag, int flags,
> > unsigned int *allocated)
> > {
> > - ext4_lblk_t ee_block;
> > + ext4_lblk_t ee_block, orig_ee_block;
> > struct ext4_extent *ex;
> > - unsigned int ee_len, depth;
> > - int unwritten;
> > - int split_flag1, flags1;
> > + unsigned int ee_len, orig_ee_len, depth;
> > + int unwritten, orig_unwritten;
> > + int split_flag1 = 0, flags1 = 0;
> > + int err = 0, orig_err;
>
> Cannot orig_err be used uninitialized in this function? At least it isn't
> obvious to me some of the branches setting it is always taken.
Hi Jan, thanks for the reviews. Yes orig_err is always initialized
before it is used (initialized on error and used in zeroout path which
is only taked on error), but I agree that we can just init it to 0.
>
> > @@ -3395,23 +3453,29 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
> > ee_len = ext4_ext_get_actual_len(ex);
> > unwritten = ext4_ext_is_unwritten(ex);
> >
> > + orig_ee_block = ee_block;
> > + orig_ee_len = ee_len;
> > + orig_unwritten = unwritten;
> > +
> > /* Do not cache extents that are in the process of being modified. */
> > flags |= EXT4_EX_NOCACHE;
> >
> > if (map->m_lblk + map->m_len < ee_block + ee_len) {
> > - split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
> > flags1 = flags | EXT4_GET_BLOCKS_SPLIT_NOMERGE;
> > if (unwritten)
> > split_flag1 |= EXT4_EXT_MARK_UNWRIT1 |
> > EXT4_EXT_MARK_UNWRIT2;
> > - if (split_flag & EXT4_EXT_DATA_VALID2)
> > - split_flag1 |= map->m_lblk > ee_block ?
> > - EXT4_EXT_DATA_PARTIAL_VALID1 :
> > - EXT4_EXT_DATA_ENTIRE_VALID1;
> > path = ext4_split_extent_at(handle, inode, path,
> > map->m_lblk + map->m_len, split_flag1, flags1);
> > - if (IS_ERR(path))
> > - return path;
> > +
> > + if (IS_ERR(path)) {
> > + orig_err = PTR_ERR(path);
> > + if (orig_err != -ENOSPC && orig_err != -EDQUOT &&
> > + orig_err != -ENOMEM)
> > + return path;
> > +
> > + goto try_zeroout;
> > + }
> > /*
> > * Update path is required because previous ext4_split_extent_at
> > * may result in split of original leaf or extent zeroout.
> > @@ -3427,22 +3491,68 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
> > ext4_free_ext_path(path);
> > return ERR_PTR(-EFSCORRUPTED);
> > }
> > - unwritten = ext4_ext_is_unwritten(ex);
> > }
> >
> > if (map->m_lblk >= ee_block) {
> > - split_flag1 = split_flag & EXT4_EXT_DATA_VALID2;
> > + split_flag1 = 0;
> > if (unwritten) {
> > split_flag1 |= EXT4_EXT_MARK_UNWRIT1;
> > - split_flag1 |= split_flag & (EXT4_EXT_MAY_ZEROOUT |
> > - EXT4_EXT_MARK_UNWRIT2);
> > + split_flag1 |= split_flag & EXT4_EXT_MARK_UNWRIT2;
> > }
> > - path = ext4_split_extent_at(handle, inode, path,
> > - map->m_lblk, split_flag1, flags);
> > + path = ext4_split_extent_at(handle, inode, path, map->m_lblk,
> > + split_flag1, flags);
> > +
> > + if (IS_ERR(path)) {
> > + orig_err = PTR_ERR(path);
> > + if (orig_err != -ENOSPC && orig_err != -EDQUOT &&
> > + orig_err != -ENOMEM)
> > + return path;
> > +
> > + goto try_zeroout;
> > + }
> > + }
> > +
> > + if (!err)
>
> Nothing touches 'err' in this function...
Yes :), I'll remove this.
>
> > + goto out;
> > +
> > +try_zeroout:
> > + /*
> > + * There was an error in splitting the extent, just zeroout and convert
> > + * to initialize as a last resort
> > + */
> > + if (split_flag & EXT4_EXT_MAY_ZEROOUT) {
> > + path = ext4_find_extent(inode, map->m_lblk, NULL, flags);
> > if (IS_ERR(path))
> > return path;
> > +
> > + depth = ext_depth(inode);
> > + ex = path[depth].p_ext;
> > + ee_block = le32_to_cpu(ex->ee_block);
> > + ee_len = ext4_ext_get_actual_len(ex);
> > + unwritten = ext4_ext_is_unwritten(ex);
> > +
> > + /*
> > + * The extent to zeroout should have been unchanged
> > + * but its not, just return error to caller
> > + */
> > + if (WARN_ON(ee_block != orig_ee_block ||
> > + ee_len != orig_ee_len ||
> > + unwritten != orig_unwritten))
> > + return ERR_PTR(orig_err);
> > +
> > + /*
> > + * Something went wrong in zeroout, just return the
> > + * original error
> > + */
> > + if (ext4_split_extent_zeroout(handle, inode, path, map, flags))
> > + return ERR_PTR(orig_err);
> > }
>
> Also nothing seems to zero out orig_err in case zero out above succeeded.
> What am I missing?
So if zeroout here succeeds we just goto out and return path, we never
use orig_err. Not the best practice and admittedly, I seem to have
complicated the error handling a bit. I will streamline it in v2.
Thanks for pointing this out.
Ojaswin.
>
> Honza
>
> >
> > + /* There's an error and we can't zeroout, just return the err */
> > + return ERR_PTR(orig_err);
> > +
> > +out:
> > +
> > if (allocated) {
> > if (map->m_lblk + map->m_len > ee_block + ee_len)
> > *allocated = ee_len - (map->m_lblk - ee_block);
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
Powered by blists - more mailing lists