lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHjL5J3Ui9VMZt2o@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Thu, 17 Jul 2025 15:39:40 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Baokun Li <libaokun1@...wei.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
        jack@...e.cz, linux-kernel@...r.kernel.org, julia.lawall@...ia.fr,
        yi.zhang@...wei.com, yangerkun@...wei.com, libaokun@...weicloud.com
Subject: Re: [PATCH v3 01/17] ext4: add ext4_try_lock_group() to skip busy
 groups

On Mon, Jul 14, 2025 at 09:03:11PM +0800, Baokun Li wrote:
> When ext4 allocates blocks, we used to just go through the block groups
> one by one to find a good one. But when there are tons of block groups
> (like hundreds of thousands or even millions) and not many have free space
> (meaning they're mostly full), it takes a really long time to check them
> all, and performance gets bad. So, we added the "mb_optimize_scan" mount
> option (which is on by default now). It keeps track of some group lists,
> so when we need a free block, we can just grab a likely group from the
> right list. This saves time and makes block allocation much faster.
> 
> But when multiple processes or containers are doing similar things, like
> constantly allocating 8k blocks, they all try to use the same block group
> in the same list. Even just two processes doing this can cut the IOPS in
> half. For example, one container might do 300,000 IOPS, but if you run two
> at the same time, the total is only 150,000.
> 
> Since we can already look at block groups in a non-linear way, the first
> and last groups in the same list are basically the same for finding a block
> right now. Therefore, add an ext4_try_lock_group() helper function to skip
> the current group when it is locked by another process, thereby avoiding
> contention with other processes. This helps ext4 make better use of having
> multiple block groups.
> 
> Also, to make sure we don't skip all the groups that have free space
> when allocating blocks, we won't try to skip busy groups anymore when
> ac_criteria is CR_ANY_FREE.
> 
> Performance test data follows:
> 
> Test: Running will-it-scale/fallocate2 on CPU-bound containers.
> Observation: Average fallocate operations per container per second.
> 
> |CPU: Kunpeng 920   |          P80            |
> |Memory: 512GB      |-------------------------|
> |960GB SSD (0.5GB/s)| base  |    patched      |
> |-------------------|-------|-----------------|
> |mb_optimize_scan=0 | 2667  | 4821  (+80.7%)  |
> |mb_optimize_scan=1 | 2643  | 4784  (+81.0%)  |
> 
> |CPU: AMD 9654 * 2  |          P96            |
> |Memory: 1536GB     |-------------------------|
> |960GB SSD (1GB/s)  | base  |    patched      |
> |-------------------|-------|-----------------|
> |mb_optimize_scan=0 | 3450  | 15371 (+345%)   |
> |mb_optimize_scan=1 | 3209  | 6101  (+90.0%)  |
> 
> Signed-off-by: Baokun Li <libaokun1@...wei.com>
> Reviewed-by: Jan Kara <jack@...e.cz>

Hey Baokun, I reviewed some of the patches in v2 but i think that was
very last moment so I'll add the comments in this series, dont mind the
copy paste :)

The patch itself looks good, thanks for the changes.

Feel free to add:

 Reviewed-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>

Regards,
ojaswin


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ