lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72def5a4-c30a-4461-8bce-c6c2b09b044c@huaweicloud.com>
Date: Wed, 12 Nov 2025 12:46:26 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Yang Erkun <yangerkun@...wei.com>, linux-ext4@...r.kernel.org,
 tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz
Cc: libaokun1@...wei.com, yangerkun@...weicloud.com
Subject: Re: [PATCH v3 1/3] ext4: remove useless code in
 ext4_map_create_blocks

Hi!

On 11/7/2025 7:58 PM, Yang Erkun wrote:
> IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
> dioread_nolock buffer writeback, they all means we need a unwritten
> extent(or this extent has already been initialized), and the split won't
> zero the range we really write. So this check seems useless. Besides,
> even if we repeatedly execute ext4_es_insert_extent, there won't
> actually be any issues.
> 
> Reviewed-by: Zhang Yi <yi.zhang@...wei.com>
> Reviewed-by: Jan Kara <jack@...e.cz>
> Signed-off-by: Yang Erkun <yangerkun@...wei.com>
> ---
>  fs/ext4/inode.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e99306a8f47c..e8bac93ca668 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
>  static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  				  struct ext4_map_blocks *map, int flags)
>  {
> -	struct extent_status es;
>  	unsigned int status;
>  	int err, retval = 0;
>  
> @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  			return err;
>  	}
>  
> -	/*
> -	 * If the extent has been zeroed out, we don't need to update
> -	 * extent status tree.
> -	 */
> -	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> -	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> -		if (ext4_es_is_written(&es))
> -			return retval;
> -	}
> -

Sorry, I think I was wrong and I now realize that we need to keep this code
snippet. Because ext4_split_extent() may convert the on-disk extent to written
with the EXT4_EXT_MAY_ZEROOUT flag set. If we drop this check, it will add an
unwritten extent into the extent status tree, which is inconsistent with the
real one.

Although this might not seem like a practical issue now, it's a potential
problem and conflicts with the ext4_es_cache_extent() extension I am currently
developing[1], which triggers a mismatch alarm when caching extents.

Besides, I also notice there is a potential stale data issue about the
EXT4_EXT_MAY_ZEROOUT flag.

Assume we have an unwritten file, and then DIO writes the second half.

   [UUUUUUUUUUUUUUUU] on-disk extent
   [UUUUUUUUUUUUUUUU] extent status tree
            |<----->| dio write

1. ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO,
   EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set.
2. ext4_map_blocks() find this extent and call ext4_split_convert_extents()
   with EXT4_GET_BLOCKS_CONVERT and the above flags set.
3. call ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and
   EXT4_EXT_DATA_VALID2 flags set.
4. call ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2,
   EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags
   set.
5. We failed to insert extent since -NOSPC in ext4_split_extent_at().
6. ext4_split_extent_at() zero out the first half but convert the entire on-disk
   extent to written since the EXT4_EXT_DATA_VALID2 flag set, and left the second
   half as unwritten in the extent status tree.

   [0000000000SSSSSS]  data
   [WWWWWWWWWWWWWWWW]  on-disk extent
   [WWWWWWWWWWUUUUUU]  extent status tree

7. If the dio failed to write data to the disk, If DIO fails to write data, the
   stale data in the second half will be exposed.

Therefore, I think we should zero out the entire extent range to zero for this
case, and also mark the extent as written in the extent status tree (that is the
another reason I think we should keep this code snippet). I was still confirming
this issue and thinking about how to fix it, but I believe it would be better
not to merge this patch for now. What do you think?

[1]. https://lore.kernel.org/linux-ext4/20251031062905.4135909-1-yi.zhang@huaweicloud.com/

Thanks,
Yi.

>  	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>  	ext4_es_insert_extent(inode, map->m_lblk, map->m_len, map->m_pblk,


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ