linux-ext4 - Re: [PATCH v3 01/17] ext4: add ext4_try_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d87eab9a-8224-477f-ae81-d4f205ee78b6@huawei.com>
Date: Sat, 19 Jul 2025 08:29:37 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Andi Kleen <ak@...ux.intel.com>
CC: <linux-ext4@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 01/17] ext4: add ext4_try_lock_group() to skip busy
 groups

On 2025/7/18 6:28, Andi Kleen wrote:
> Baokun Li <libaokun1@...wei.com> writes:
>
>> When ext4 allocates blocks, we used to just go through the block groups
>> one by one to find a good one. But when there are tons of block groups
>> (like hundreds of thousands or even millions) and not many have free space
>> (meaning they're mostly full), it takes a really long time to check them
>> all, and performance gets bad. So, we added the "mb_optimize_scan" mount
>> option (which is on by default now). It keeps track of some group lists,
>> so when we need a free block, we can just grab a likely group from the
>> right list. This saves time and makes block allocation much faster.
>>
>> But when multiple processes or containers are doing similar things, like
>> constantly allocating 8k blocks, they all try to use the same block group
>> in the same list. Even just two processes doing this can cut the IOPS in
>> half. For example, one container might do 300,000 IOPS, but if you run two
>> at the same time, the total is only 150,000.
>>
>> Since we can already look at block groups in a non-linear way, the first
>> and last groups in the same list are basically the same for finding a block
>> right now. Therefore, add an ext4_try_lock_group() helper function to skip
>> the current group when it is locked by another process, thereby avoiding
>> contention with other processes. This helps ext4 make better use of having
>> multiple block groups.
> It seems this makes block allocation non deterministic, but depend on
> the system load. I can see where this could cause problems when
> reproducing bugs at least, but perhaps also in other cases.
>
> Better perhaps just round robin the groups?
> Or at least add a way to turn it off.
>
> -Andi
>
As Ted mentioned, Ext4 has never guaranteed deterministic allocation. We
do attempt a predetermined goal in ext4_mb_find_by_goal(), and this part
has no trylock logic, meaning we'll always attempt to scan the target
group once—that's deterministic.

However, if the target attempt fails, the primary goal for subsequent
allocation is to find suitable free space as quickly as possible, so
there's no need to contend with other processes for non-target groups.


Cheers,
Baokun