[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xlzlyqudvp7a6ufdvc4rgsoe7ty425rrexuxgfbgwxoazfjd25@6eqbh66w7ayr>
Date: Fri, 27 Jun 2025 20:19:40 +0200
From: Jan Kara <jack@...e.cz>
To: Baokun Li <libaokun1@...wei.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu, jack@...e.cz,
adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com, linux-kernel@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update
s_mb_last_group
On Mon 23-06-25 15:32:51, Baokun Li wrote:
> After we optimized the block group lock, we found another lock
> contention issue when running will-it-scale/fallocate2 with multiple
> processes. The fallocate's block allocation and the truncate's block
> release were fighting over the s_md_lock. The problem is, this lock
> protects totally different things in those two processes: the list of
> freed data blocks (s_freed_data_list) when releasing, and where to start
> looking for new blocks (mb_last_group) when allocating.
>
> Now we only need to track s_mb_last_group and no longer need to track
> s_mb_last_start, so we don't need the s_md_lock lock to ensure that the
> two are consistent, and we can ensure that the s_mb_last_group read is up
> to date by using smp_store_release/smp_load_acquire.
>
> Besides, the s_mb_last_group data type only requires ext4_group_t
> (i.e., unsigned int), rendering unsigned long superfluous.
>
> Performance test data follows:
>
> Test: Running will-it-scale/fallocate2 on CPU-bound containers.
> Observation: Average fallocate operations per container per second.
>
> | Kunpeng 920 / 512GB -P80| AMD 9654 / 1536GB -P96 |
> Disk: 960GB SSD |-------------------------|-------------------------|
> | base | patched | base | patched |
> -------------------|-------|-----------------|-------|-----------------|
> mb_optimize_scan=0 | 4821 | 7612 (+57.8%) | 15371 | 21647 (+40.8%) |
> mb_optimize_scan=1 | 4784 | 7568 (+58.1%) | 6101 | 9117 (+49.4%) |
>
> Signed-off-by: Baokun Li <libaokun1@...wei.com>
...
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 5cdae3bda072..3f103919868b 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2168,11 +2168,9 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
> ac->ac_buddy_folio = e4b->bd_buddy_folio;
> folio_get(ac->ac_buddy_folio);
> /* store last allocated for subsequent stream allocation */
> - if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
> - spin_lock(&sbi->s_md_lock);
> - sbi->s_mb_last_group = ac->ac_f_ex.fe_group;
> - spin_unlock(&sbi->s_md_lock);
> - }
> + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC)
> + /* pairs with smp_load_acquire in ext4_mb_regular_allocator() */
> + smp_store_release(&sbi->s_mb_last_group, ac->ac_f_ex.fe_group);
Do you really need any kind of barrier (implied by smp_store_release())
here? I mean the store to s_mb_last_group is perfectly fine to be reordered
with other accesses from the thread, isn't it? As such it should be enough
to have WRITE_ONCE() here...
> /*
> * As we've just preallocated more space than
> * user requested originally, we store allocated
> @@ -2844,12 +2842,9 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> }
>
> /* if stream allocation is enabled, use global goal */
> - if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
> - /* TBD: may be hot point */
> - spin_lock(&sbi->s_md_lock);
> - ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
> - spin_unlock(&sbi->s_md_lock);
> - }
> + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC)
> + /* pairs with smp_store_release in ext4_mb_use_best_found() */
> + ac->ac_g_ex.fe_group = smp_load_acquire(&sbi->s_mb_last_group);
... and READ_ONCE() here.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists