[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdrtjnhl6hdmevg42uh4b5k3c7hyugeyyk7s3xzhnm75sh2nz3@wzh4hqn3hmjr>
Date: Fri, 27 Jun 2025 20:06:15 +0200
From: Jan Kara <jack@...e.cz>
To: Baokun Li <libaokun1@...wei.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu, jack@...e.cz,
adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com, linux-kernel@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 01/16] ext4: add ext4_try_lock_group() to skip busy
groups
On Mon 23-06-25 15:32:49, Baokun Li wrote:
> When ext4 allocates blocks, we used to just go through the block groups
> one by one to find a good one. But when there are tons of block groups
> (like hundreds of thousands or even millions) and not many have free space
> (meaning they're mostly full), it takes a really long time to check them
> all, and performance gets bad. So, we added the "mb_optimize_scan" mount
> option (which is on by default now). It keeps track of some group lists,
> so when we need a free block, we can just grab a likely group from the
> right list. This saves time and makes block allocation much faster.
>
> But when multiple processes or containers are doing similar things, like
> constantly allocating 8k blocks, they all try to use the same block group
> in the same list. Even just two processes doing this can cut the IOPS in
> half. For example, one container might do 300,000 IOPS, but if you run two
> at the same time, the total is only 150,000.
>
> Since we can already look at block groups in a non-linear way, the first
> and last groups in the same list are basically the same for finding a block
> right now. Therefore, add an ext4_try_lock_group() helper function to skip
> the current group when it is locked by another process, thereby avoiding
> contention with other processes. This helps ext4 make better use of having
> multiple block groups.
>
> Also, to make sure we don't skip all the groups that have free space
> when allocating blocks, we won't try to skip busy groups anymore when
> ac_criteria is CR_ANY_FREE.
>
> Performance test data follows:
>
> Test: Running will-it-scale/fallocate2 on CPU-bound containers.
> Observation: Average fallocate operations per container per second.
>
> | Kunpeng 920 / 512GB -P80| AMD 9654 / 1536GB -P96 |
> Disk: 960GB SSD |-------------------------|-------------------------|
> | base | patched | base | patched |
> -------------------|-------|-----------------|-------|-----------------|
> mb_optimize_scan=0 | 2667 | 4821 (+80.7%) | 3450 | 15371 (+345%) |
> mb_optimize_scan=1 | 2643 | 4784 (+81.0%) | 3209 | 6101 (+90.0%) |
>
> Signed-off-by: Baokun Li <libaokun1@...wei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists