[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf0bda81-5e75-4b5e-aac1-685e4697f513@huawei.com>
Date: Sat, 19 Jul 2025 08:37:44 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
CC: <linux-ext4@...r.kernel.org>, <tytso@....edu>, <adilger.kernel@...ger.ca>,
<jack@...e.cz>, <linux-kernel@...r.kernel.org>, <julia.lawall@...ia.fr>,
<yi.zhang@...wei.com>, <yangerkun@...wei.com>, <libaokun@...weicloud.com>
Subject: Re: [PATCH v3 01/17] ext4: add ext4_try_lock_group() to skip busy
groups
On 2025/7/17 18:09, Ojaswin Mujoo wrote:
> On Mon, Jul 14, 2025 at 09:03:11PM +0800, Baokun Li wrote:
>> When ext4 allocates blocks, we used to just go through the block groups
>> one by one to find a good one. But when there are tons of block groups
>> (like hundreds of thousands or even millions) and not many have free space
>> (meaning they're mostly full), it takes a really long time to check them
>> all, and performance gets bad. So, we added the "mb_optimize_scan" mount
>> option (which is on by default now). It keeps track of some group lists,
>> so when we need a free block, we can just grab a likely group from the
>> right list. This saves time and makes block allocation much faster.
>>
>> But when multiple processes or containers are doing similar things, like
>> constantly allocating 8k blocks, they all try to use the same block group
>> in the same list. Even just two processes doing this can cut the IOPS in
>> half. For example, one container might do 300,000 IOPS, but if you run two
>> at the same time, the total is only 150,000.
>>
>> Since we can already look at block groups in a non-linear way, the first
>> and last groups in the same list are basically the same for finding a block
>> right now. Therefore, add an ext4_try_lock_group() helper function to skip
>> the current group when it is locked by another process, thereby avoiding
>> contention with other processes. This helps ext4 make better use of having
>> multiple block groups.
>>
>> Also, to make sure we don't skip all the groups that have free space
>> when allocating blocks, we won't try to skip busy groups anymore when
>> ac_criteria is CR_ANY_FREE.
>>
>> Performance test data follows:
>>
>> Test: Running will-it-scale/fallocate2 on CPU-bound containers.
>> Observation: Average fallocate operations per container per second.
>>
>> |CPU: Kunpeng 920 | P80 |
>> |Memory: 512GB |-------------------------|
>> |960GB SSD (0.5GB/s)| base | patched |
>> |-------------------|-------|-----------------|
>> |mb_optimize_scan=0 | 2667 | 4821 (+80.7%) |
>> |mb_optimize_scan=1 | 2643 | 4784 (+81.0%) |
>>
>> |CPU: AMD 9654 * 2 | P96 |
>> |Memory: 1536GB |-------------------------|
>> |960GB SSD (1GB/s) | base | patched |
>> |-------------------|-------|-----------------|
>> |mb_optimize_scan=0 | 3450 | 15371 (+345%) |
>> |mb_optimize_scan=1 | 3209 | 6101 (+90.0%) |
>>
>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
>> Reviewed-by: Jan Kara <jack@...e.cz>
> Hey Baokun, I reviewed some of the patches in v2 but i think that was
> very last moment so I'll add the comments in this series, dont mind the
> copy paste :)
>
> The patch itself looks good, thanks for the changes.
>
> Feel free to add:
>
> Reviewed-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Sorry for missing your review, I've snowed under with work lately.
Thanks for the review!
Cheers,
Baokun
Powered by blists - more mailing lists