lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb5b9323-d7ea-467b-a59f-ef6e415e766a@huaweicloud.com>
Date: Fri, 28 Nov 2025 12:37:33 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: Ojaswin Mujoo <ojaswin@...ux.ibm.com>, linux-ext4@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu,
 adilger.kernel@...ger.ca, yi.zhang@...wei.com, yizhang089@...il.com,
 libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when
 caching on-disk extents

On 11/27/2025 8:24 PM, Jan Kara wrote:
> Hi Yi!
> 
> On Mon 24-11-25 13:04:04, Zhang Yi wrote:
>> On 11/23/2025 6:55 PM, Ojaswin Mujoo wrote:
>>> On Fri, Nov 21, 2025 at 02:07:58PM +0800, Zhang Yi wrote:
>>>> Changes since v1:
>>>>  - Rebase the codes based on the latest linux-next 20251120.
>>>>  - Add patches 01-05, fix two stale data problems caused by
>>>
>>> Hi Zhang, thanks for the patches.
>>>
>>
>> Thank you for take time to look at this series.
>>
>>> I've always felt uncomfortable with the ZEROOUT code here because it
>>> seems to have many such bugs as you pointed out in the series. Its very
>>> fragile and the bugs are easy to miss behind all the data valid and
>>> split flags mess. 
>>>
>>
>> Yes, I agree with you. The implementation of EXT4_EXT_MAY_ZEROOUT has
>> significantly increased the complexity of split extents and the
>> potential for bugs.
> 
> Yep, that code is complex and prone to bugs.
> 
>>> As per my understanding, ZEROOUT logic seems to be a special best-effort
>>> try to make the split/convert operation "work" when dealing with
>>> transient errors like ENOSPC etc. I was just wondering if it makes sense
>>> to just get rid of the whole ZEROOUT logic completely and just reset the
>>> extent to orig state if there is any error. This allows us to get rid of
>>> DATA_VALID* flags as well and makes the whole ext4_split_convert_extents() 
>>> slightly less messy.
>>>
>>> Maybe we can have a retry loop at the top level caller if we want to try
>>> again for say ENOSPC or ENOMEM. 
>>>
>>> Would love to hear your thoughts on it.
>>
>> I think this is a direction worth exploring. However, what I am
>> currently considering is that we need to address this scenario of
>> splitting extent during the I/O completion. Although the ZEROOUT logic
>> is fragile and has many issues recently, it currently serves as a
>> fallback solution for handling ENOSPC errors that arise when splitting
>> extents during I/O completion. It ensures that I/O operations do not
>> fail due to insufficient extent blocks.
> 
> Also partial extent zeroout offers a good performance win when the
> portion needing zeroout is small (we can save extent splitting). And I
> agree it is a good safety net for ENOSPC issues - otherwise there's no
> guarantee page writeback can finish without hitting ENOSPC. We do have
> reserved blocks for these cases but the pool is limited so you can still
> run out of blocks if you try hard enough.

Yes, Indeed!

> 
>> Please see ext4_convert_unwritten_extents_endio(). Although we have made
>> our best effort to tried to split extents using
>> EXT4_GET_BLOCKS_IO_CREATE_EXT before issuing I/Os, we still have not
>> covered all scenarios. Moreover, after converting the buffered I/O path
>> to the iomap infrastructure in the future, we may need to split extents
>> during the I/O completion worker[1].
> 
> Yes, this might be worth exploring. The advantage of doing extent splitting
> in advance is that on IO submission you have the opportunity of restarting
> the transaction on ENOSPC to possibly release some blocks. This is not
> easily doable e.g. on writeback completion so the pressure on the pool of
> reserved blocks is going to be more common (previously you needed reserved
> blocks only when writeback was racing with fallocate or similar, now you
> may need them each time you write in the middle of unwritten extent). So I
> think the change will need some testing whether it isn't too easy to hit
> ENOSPC conditions on IO completion without EXT4_GET_BLOCKS_IO_CREATE_EXT.
> But otherwise it's always nice to remove code :)
>  
> 								Honza

Yes, we need to be very careful about this, I have written a POC patch and
am currently conducting related tests. I hope it works.  :-)

Thanks,
Yi.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ