[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <mxecryxfba54eb4a5etvumjsqs4cayxbv47wufo7mso3gxlzf5@qqftillhryo6>
Date: Thu, 13 Nov 2025 12:27:54 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Yang Erkun <yangerkun@...wei.com>, linux-ext4@...r.kernel.org,
tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz, libaokun1@...wei.com,
yangerkun@...weicloud.com
Subject: Re: [PATCH v3 1/3] ext4: remove useless code in
ext4_map_create_blocks
Hi!
On Wed 12-11-25 12:46:26, Zhang Yi wrote:
> On 11/7/2025 7:58 PM, Yang Erkun wrote:
> > IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
> > dioread_nolock buffer writeback, they all means we need a unwritten
> > extent(or this extent has already been initialized), and the split won't
> > zero the range we really write. So this check seems useless. Besides,
> > even if we repeatedly execute ext4_es_insert_extent, there won't
> > actually be any issues.
> >
> > Reviewed-by: Zhang Yi <yi.zhang@...wei.com>
> > Reviewed-by: Jan Kara <jack@...e.cz>
> > Signed-off-by: Yang Erkun <yangerkun@...wei.com>
> > ---
> > fs/ext4/inode.c | 11 -----------
> > 1 file changed, 11 deletions(-)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index e99306a8f47c..e8bac93ca668 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
> > static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> > struct ext4_map_blocks *map, int flags)
> > {
> > - struct extent_status es;
> > unsigned int status;
> > int err, retval = 0;
> >
> > @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> > return err;
> > }
> >
> > - /*
> > - * If the extent has been zeroed out, we don't need to update
> > - * extent status tree.
> > - */
> > - if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> > - ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> > - if (ext4_es_is_written(&es))
> > - return retval;
> > - }
> > -
>
> Sorry, I think I was wrong and I now realize that we need to keep this code
> snippet. Because ext4_split_extent() may convert the on-disk extent to written
> with the EXT4_EXT_MAY_ZEROOUT flag set. If we drop this check, it will add an
> unwritten extent into the extent status tree, which is inconsistent with the
> real one.
>
> Although this might not seem like a practical issue now, it's a potential
> problem and conflicts with the ext4_es_cache_extent() extension I am currently
> developing[1], which triggers a mismatch alarm when caching extents.
>
> Besides, I also notice there is a potential stale data issue about the
> EXT4_EXT_MAY_ZEROOUT flag.
>
> Assume we have an unwritten file, and then DIO writes the second half.
>
> [UUUUUUUUUUUUUUUU] on-disk extent
> [UUUUUUUUUUUUUUUU] extent status tree
> |<----->| dio write
>
> 1. ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO,
> EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set.
> 2. ext4_map_blocks() find this extent and call ext4_split_convert_extents()
> with EXT4_GET_BLOCKS_CONVERT and the above flags set.
> 3. call ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and
> EXT4_EXT_DATA_VALID2 flags set.
> 4. call ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2,
> EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags
> set.
> 5. We failed to insert extent since -NOSPC in ext4_split_extent_at().
> 6. ext4_split_extent_at() zero out the first half but convert the entire on-disk
> extent to written since the EXT4_EXT_DATA_VALID2 flag set, and left the second
> half as unwritten in the extent status tree.
>
> [0000000000SSSSSS] data
> [WWWWWWWWWWWWWWWW] on-disk extent
> [WWWWWWWWWWUUUUUU] extent status tree
>
> 7. If the dio failed to write data to the disk, If DIO fails to write data, the
> stale data in the second half will be exposed.
Right. That looks like a bug.
> Therefore, I think we should zero out the entire extent range to zero for this
> case, and also mark the extent as written in the extent status tree (that is the
> another reason I think we should keep this code snippet).
I agree this is probably the easiest fix.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists