lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b8eb418-f741-46eb-b2ff-7d27ec1d2b4b@huaweicloud.com>
Date: Wed, 4 Feb 2026 09:33:14 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Theodore Tso <tytso@....edu>
Cc: Christoph Hellwig <hch@...radead.org>, linux-ext4@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 adilger.kernel@...ger.ca, jack@...e.cz, ojaswin@...ux.ibm.com,
 ritesh.list@...il.com, djwong@...nel.org, Zhang Yi <yi.zhang@...wei.com>,
 yizhang089@...il.com, libaokun1@...wei.com, yangerkun@...wei.com,
 yukuai@...-78bjiv52429oh8qptp.cn-shenzhen.alb.aliyuncs.com
Subject: Re: [PATCH -next v2 00/22] ext4: use iomap for regular file's
 buffered I/O path

Hi, Ted.

On 2/3/2026 9:14 PM, Theodore Tso wrote:
> On Tue, Feb 03, 2026 at 05:18:10PM +0800, Zhang Yi wrote:
>> This means that the ordered journal mode is no longer in ext4 used
>> under the iomap infrastructure.  The main reason is that iomap
>> processes each folio one by one during writeback. It first holds the
>> folio lock and then starts a transaction to create the block mapping.
>> If we still use the ordered mode, we need to perform writeback in
>> the logging process, which may require initiating a new transaction,
>> potentially leading to deadlock issues. In addition, ordered journal
>> mode indeed has many synchronization dependencies, which increase
>> the risk of deadlocks, and I believe this is one of the reasons why
>> ext4_do_writepages() is implemented in such a complicated manner.
>> Therefore, I think we need to give up using the ordered data mode.
>>
>> Currently, there are three scenarios where the ordered mode is used:
>> 1) append write,
>> 2) partial block truncate down, and
>> 3) online defragmentation.
>>
>> For append write, we can always allocate unwritten blocks to avoid
>> using the ordered journal mode.
> 
> This is going to be a pretty severe performance regression, since it
> means that we will be doubling the journal load for append writes.

Although this will double the journal load compared to directly
allocating written blocks, I think it will not result in significant
performance regression compared to the current append write process, as
this is consistent with the behavior after dioread_nolock is enabled by
default now.

> What we really need to do here is to first write out the data blocks,
> and then only start the transaction handle to modify the data blocks
> *after* the data blocks have been written (to heretofore, unused
> blocks that were just allocated).  It means inverting the order in
> which we write data blocks for the append write case, and in fact it
> will improve fsync() performance since we won't be gating writing the
> commit block on the date blocks getting written out in the append
> write case.
> 

Yeah, thank you for the suggestion. I agree with you. We are planning to
implement this next. Baokun is currently working to develop a POC. Our
current idea is to use inode PA (The benefit of using PA is that it can
avoid changes to disk metadata, and the pre-allocation operation can be
closed within the mb-allocater) to pre-allocate blocks before doing
writeback, and then map the actual written type extents after the data is
written, which would avoid this journal overhead of unwritten
allocations. At the same time, this could also lay the foundation for
future support of COW writes for reflinks.

> Cheers,
> 
> 					- Ted

Thanks,
Yi.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ