lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2713db6e-ff43-4583-b328-412e38f3d7bf@huaweicloud.com>
Date: Fri, 28 Nov 2025 11:45:51 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
 yi.zhang@...wei.com, yizhang089@...il.com, libaokun1@...wei.com,
 yangerkun@...wei.com
Subject: Re: [PATCH v2 03/13] ext4: don't zero the entire extent if
 EXT4_EXT_DATA_PARTIAL_VALID1

On 11/27/2025 9:41 PM, Jan Kara wrote:
> On Fri 21-11-25 14:08:01, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> When allocating initialized blocks from a large unwritten extent, or
>> when splitting an unwritten extent during end I/O and converting it to
>> initialized, there is currently a potential issue of stale data if the
>> extent needs to be split in the middle.
>>
>>        0  A      B  N
>>        [UUUUUUUUUUUU]    U: unwritten extent
>>        [--DDDDDDDD--]    D: valid data
>>           |<-  ->| ----> this range needs to be initialized
>>
>> ext4_split_extent() first try to split this extent at B with
>> EXT4_EXT_DATA_ENTIRE_VALID1 and EXT4_EXT_MAY_ZEROOUT flag set, but
>> ext4_split_extent_at() failed to split this extent due to temporary lack
>> of space. It zeroout B to N and mark the entire extent from 0 to N
>> as written.
>>
>>        0  A      B  N
>>        [WWWWWWWWWWWW]    W: written extent
>>        [SSDDDDDDDDZZ]    Z: zeroed, S: stale data
>>
>> ext4_split_extent() then try to split this extent at A with
>> EXT4_EXT_DATA_VALID2 flag set. This time, it split successfully and left
>> a stale written extent from 0 to A.
>>
>>        0  A      B   N
>>        [WW|WWWWWWWWWW]
>>        [SS|DDDDDDDDZZ]
>>
>> Fix this by pass EXT4_EXT_DATA_PARTIAL_VALID1 to ext4_split_extent_at()
>> when splitting at B, don't convert the entire extent to written and left
>> it as unwritten after zeroing out B to N. The remaining work is just
>> like the standard two-part split. ext4_split_extent() will pass the
>> EXT4_EXT_DATA_VALID2 flag when it calls ext4_split_extent_at() for the
>> second time, allowing it to properly handle the split. If the split is
>> successful, it will keep extent from 0 to A as unwritten.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> 
> Good catch on the data exposure issue! First I'd like to discuss whether
> there isn't a way to fix these problems in a way that doesn't make the
> already complex code even more complex. My observation is that
> EXT4_EXT_MAY_ZEROOUT is only set in ext4_ext_convert_to_initialized() and
> in ext4_split_convert_extents() which both call ext4_split_extent(). The
> actual extent zeroing happens in ext4_split_extent_at() and in
> ext4_ext_convert_to_initialized().

Yes.

> I think the code would be much clearer
> if we just centralized all the zeroing in ext4_split_extent(). At that
> place the situation is actually pretty simple:

Thank you for your suggestion!

> 
> 1) 'ex' is unwritten, 'map' describes part with already written data which
> we want to convert to initialized (generally IO completion situation) => we
> can zero out boundaries if they are smaller than max_zeroout or if extent
> split fails.
> 

Yes. Agree.

> 2) 'ex' is unwritten, 'map' describes part we are preparing for write (IO
> submission) => the split is opportunistic here, if we cannot split due to
> ENOSPC, just go on and deal with it at IO completion time. No zeroing
> needed.

Yes. At the same time, if we can indeed move the entire split unwritten
operation to be handled after I/O completion in the future, it would also be
more convenient to remove this segment of logic.

> 
> 3) 'ex' is written, 'map' describes part that should be converted to
> unwritten => we can zero out the 'map' part if smaller than max_zeroout or
> if extent split fails.

This makes sense to me! This case it originates from the fallocate with Zero
Range operation. Currently, the zero-out operation will not be performed if
the split operation fails, instead, it immediately returns a failure.

I agree with you that we can do zero out if the 'map' part smaller than
max_zeroout instead of split extents. However, if the 'map' part is bigger
than max_zeroout and if extent split fails, I don't think zero out is a good
idea, Because it might cause zero-range calls to take a long time to execute.
Although fallocate doesn't explicitly specify how ZERO_RANGE should be
implemented, users expect it to be very fast. Therefore, in this case, if the
split fails, it would be better to simply return an error, leave things as
they are. What do you think?

> 
> This should all result in a relatively straightforward code where we can
> distinguish the three cases based on 'ex' and passed flags, we should be
> able to drop the 'EXT4_EXT_DATA_VALID*' flags and logic (possibly we could
> drop the 'split_flag' argument of ext4_split_extent() altogether), and fix
> the data exposure issues at the same time. What do you think? Am I missing
> some case?
> 

Indeed, I think the overall solution is a nice cleanup idea. :-)
But this would involve a significant amount of refactoring and logical changes.
Could we first merge the current set of patches(it could be more easier to
backport to the early LTS version), and then I can start a new series to
address this optimization?

Cheers,
Yi.

> 								Honza
> 
>> ---
>>  fs/ext4/extents.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index f7aa497e5d6c..cafe66cb562f 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3294,6 +3294,13 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
>>  		err = ext4_ext_zeroout(inode, &zero_ex);
>>  		if (err)
>>  			goto fix_extent_len;
>> +		/*
>> +		 * The first half contains partially valid data, the splitting
>> +		 * of this extent has not been completed, fix extent length
>> +		 * and ext4_split_extent() split will the first half again.
>> +		 */
>> +		if (split_flag & EXT4_EXT_DATA_PARTIAL_VALID1)
>> +			goto fix_extent_len;
>>  
>>  		/* update the extent length and mark as initialized */
>>  		ex->ee_len = cpu_to_le16(ee_len);
>> @@ -3364,7 +3371,9 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
>>  			split_flag1 |= EXT4_EXT_MARK_UNWRIT1 |
>>  				       EXT4_EXT_MARK_UNWRIT2;
>>  		if (split_flag & EXT4_EXT_DATA_VALID2)
>> -			split_flag1 |= EXT4_EXT_DATA_ENTIRE_VALID1;
>> +			split_flag1 |= map->m_lblk > ee_block ?
>> +				       EXT4_EXT_DATA_PARTIAL_VALID1 :
>> +				       EXT4_EXT_DATA_ENTIRE_VALID1;
>>  		path = ext4_split_extent_at(handle, inode, path,
>>  				map->m_lblk + map->m_len, split_flag1, flags1);
>>  		if (IS_ERR(path))
>> -- 
>> 2.46.1
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ