lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <mxecryxfba54eb4a5etvumjsqs4cayxbv47wufo7mso3gxlzf5@qqftillhryo6>
Date: Thu, 13 Nov 2025 12:27:54 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Yang Erkun <yangerkun@...wei.com>, linux-ext4@...r.kernel.org, 
	tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz, libaokun1@...wei.com, 
	yangerkun@...weicloud.com
Subject: Re: [PATCH v3 1/3] ext4: remove useless code in
 ext4_map_create_blocks

Hi!

On Wed 12-11-25 12:46:26, Zhang Yi wrote:
> On 11/7/2025 7:58 PM, Yang Erkun wrote:
> > IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
> > dioread_nolock buffer writeback, they all means we need a unwritten
> > extent(or this extent has already been initialized), and the split won't
> > zero the range we really write. So this check seems useless. Besides,
> > even if we repeatedly execute ext4_es_insert_extent, there won't
> > actually be any issues.
> > 
> > Reviewed-by: Zhang Yi <yi.zhang@...wei.com>
> > Reviewed-by: Jan Kara <jack@...e.cz>
> > Signed-off-by: Yang Erkun <yangerkun@...wei.com>
> > ---
> >  fs/ext4/inode.c | 11 -----------
> >  1 file changed, 11 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index e99306a8f47c..e8bac93ca668 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
> >  static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> >  				  struct ext4_map_blocks *map, int flags)
> >  {
> > -	struct extent_status es;
> >  	unsigned int status;
> >  	int err, retval = 0;
> >  
> > @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> >  			return err;
> >  	}
> >  
> > -	/*
> > -	 * If the extent has been zeroed out, we don't need to update
> > -	 * extent status tree.
> > -	 */
> > -	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> > -	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> > -		if (ext4_es_is_written(&es))
> > -			return retval;
> > -	}
> > -
> 
> Sorry, I think I was wrong and I now realize that we need to keep this code
> snippet. Because ext4_split_extent() may convert the on-disk extent to written
> with the EXT4_EXT_MAY_ZEROOUT flag set. If we drop this check, it will add an
> unwritten extent into the extent status tree, which is inconsistent with the
> real one.
> 
> Although this might not seem like a practical issue now, it's a potential
> problem and conflicts with the ext4_es_cache_extent() extension I am currently
> developing[1], which triggers a mismatch alarm when caching extents.
> 
> Besides, I also notice there is a potential stale data issue about the
> EXT4_EXT_MAY_ZEROOUT flag.
> 
> Assume we have an unwritten file, and then DIO writes the second half.
> 
>    [UUUUUUUUUUUUUUUU] on-disk extent
>    [UUUUUUUUUUUUUUUU] extent status tree
>             |<----->| dio write
> 
> 1. ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO,
>    EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set.
> 2. ext4_map_blocks() find this extent and call ext4_split_convert_extents()
>    with EXT4_GET_BLOCKS_CONVERT and the above flags set.
> 3. call ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and
>    EXT4_EXT_DATA_VALID2 flags set.
> 4. call ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2,
>    EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags
>    set.
> 5. We failed to insert extent since -NOSPC in ext4_split_extent_at().
> 6. ext4_split_extent_at() zero out the first half but convert the entire on-disk
>    extent to written since the EXT4_EXT_DATA_VALID2 flag set, and left the second
>    half as unwritten in the extent status tree.
> 
>    [0000000000SSSSSS]  data
>    [WWWWWWWWWWWWWWWW]  on-disk extent
>    [WWWWWWWWWWUUUUUU]  extent status tree
> 
> 7. If the dio failed to write data to the disk, If DIO fails to write data, the
>    stale data in the second half will be exposed.

Right. That looks like a bug.

> Therefore, I think we should zero out the entire extent range to zero for this
> case, and also mark the extent as written in the extent status tree (that is the
> another reason I think we should keep this code snippet).

I agree this is probably the easiest fix.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ