[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <yro4hwpttmy6e2zspvwjfdbpej6qvhlqjvlr5kp3nwffqgcnfd@z6qual55zhfq>
Date: Thu, 27 Nov 2025 14:41:52 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz,
yi.zhang@...wei.com, yizhang089@...il.com, libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 03/13] ext4: don't zero the entire extent if
EXT4_EXT_DATA_PARTIAL_VALID1
On Fri 21-11-25 14:08:01, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> When allocating initialized blocks from a large unwritten extent, or
> when splitting an unwritten extent during end I/O and converting it to
> initialized, there is currently a potential issue of stale data if the
> extent needs to be split in the middle.
>
> 0 A B N
> [UUUUUUUUUUUU] U: unwritten extent
> [--DDDDDDDD--] D: valid data
> |<- ->| ----> this range needs to be initialized
>
> ext4_split_extent() first try to split this extent at B with
> EXT4_EXT_DATA_ENTIRE_VALID1 and EXT4_EXT_MAY_ZEROOUT flag set, but
> ext4_split_extent_at() failed to split this extent due to temporary lack
> of space. It zeroout B to N and mark the entire extent from 0 to N
> as written.
>
> 0 A B N
> [WWWWWWWWWWWW] W: written extent
> [SSDDDDDDDDZZ] Z: zeroed, S: stale data
>
> ext4_split_extent() then try to split this extent at A with
> EXT4_EXT_DATA_VALID2 flag set. This time, it split successfully and left
> a stale written extent from 0 to A.
>
> 0 A B N
> [WW|WWWWWWWWWW]
> [SS|DDDDDDDDZZ]
>
> Fix this by pass EXT4_EXT_DATA_PARTIAL_VALID1 to ext4_split_extent_at()
> when splitting at B, don't convert the entire extent to written and left
> it as unwritten after zeroing out B to N. The remaining work is just
> like the standard two-part split. ext4_split_extent() will pass the
> EXT4_EXT_DATA_VALID2 flag when it calls ext4_split_extent_at() for the
> second time, allowing it to properly handle the split. If the split is
> successful, it will keep extent from 0 to A as unwritten.
>
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
Good catch on the data exposure issue! First I'd like to discuss whether
there isn't a way to fix these problems in a way that doesn't make the
already complex code even more complex. My observation is that
EXT4_EXT_MAY_ZEROOUT is only set in ext4_ext_convert_to_initialized() and
in ext4_split_convert_extents() which both call ext4_split_extent(). The
actual extent zeroing happens in ext4_split_extent_at() and in
ext4_ext_convert_to_initialized(). I think the code would be much clearer
if we just centralized all the zeroing in ext4_split_extent(). At that
place the situation is actually pretty simple:
1) 'ex' is unwritten, 'map' describes part with already written data which
we want to convert to initialized (generally IO completion situation) => we
can zero out boundaries if they are smaller than max_zeroout or if extent
split fails.
2) 'ex' is unwritten, 'map' describes part we are preparing for write (IO
submission) => the split is opportunistic here, if we cannot split due to
ENOSPC, just go on and deal with it at IO completion time. No zeroing
needed.
3) 'ex' is written, 'map' describes part that should be converted to
unwritten => we can zero out the 'map' part if smaller than max_zeroout or
if extent split fails.
This should all result in a relatively straightforward code where we can
distinguish the three cases based on 'ex' and passed flags, we should be
able to drop the 'EXT4_EXT_DATA_VALID*' flags and logic (possibly we could
drop the 'split_flag' argument of ext4_split_extent() altogether), and fix
the data exposure issues at the same time. What do you think? Am I missing
some case?
Honza
> ---
> fs/ext4/extents.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f7aa497e5d6c..cafe66cb562f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3294,6 +3294,13 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
> err = ext4_ext_zeroout(inode, &zero_ex);
> if (err)
> goto fix_extent_len;
> + /*
> + * The first half contains partially valid data, the splitting
> + * of this extent has not been completed, fix extent length
> + * and ext4_split_extent() split will the first half again.
> + */
> + if (split_flag & EXT4_EXT_DATA_PARTIAL_VALID1)
> + goto fix_extent_len;
>
> /* update the extent length and mark as initialized */
> ex->ee_len = cpu_to_le16(ee_len);
> @@ -3364,7 +3371,9 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
> split_flag1 |= EXT4_EXT_MARK_UNWRIT1 |
> EXT4_EXT_MARK_UNWRIT2;
> if (split_flag & EXT4_EXT_DATA_VALID2)
> - split_flag1 |= EXT4_EXT_DATA_ENTIRE_VALID1;
> + split_flag1 |= map->m_lblk > ee_block ?
> + EXT4_EXT_DATA_PARTIAL_VALID1 :
> + EXT4_EXT_DATA_ENTIRE_VALID1;
> path = ext4_split_extent_at(handle, inode, path,
> map->m_lblk + map->m_len, split_flag1, flags1);
> if (IS_ERR(path))
> --
> 2.46.1
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists