[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250714030113.GA23343@mit.edu>
Date: Sun, 13 Jul 2025 23:01:13 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Baokun Li <libaokun1@...wei.com>, linux-ext4@...r.kernel.org,
adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com,
linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
yangerkun@...wei.com
Subject: Re: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update
s_mb_last_group
On Thu, Jul 10, 2025 at 04:38:33PM +0200, Jan Kara wrote:
>
> Yes, apparently both approaches have their pros and cons. I'm actually
> surprised the impact of additional barriers on ARM is so big for the
> single container case. 10% gain for single container cases look nice OTOH
> realistical workloads will have more container so maybe that's not worth
> optimizing for. Ted, do you have any opinion?
Let me try to summarize; regardless of whether we use
{READ,WRITE})_ONCE or smp_load_acquire / smp_store_restore, both are
signiicantly better than using a the spinlock. The other thing about
the "single-threaded perforance" is that there is the aditional cost
of the CPU-to-CPU syncing is not free. But CPU synchronization cost
applies when that the single thread is bouncing between CPU's --- if
we hada single threaded application which is pinned on a single CPU
cost of smp_load_acquire would't be there since the cache line
wouldn't be bouncing back and forth. Is that correct, or am I missing
something?
In any case, so long as the single-threaded performance doesn't
regress relative to the current spin_lock implementation, I'm inclined
to prefer the use smp_load_acquire approach if it improves
multi-threaded allocation performance on ARM64.
Cheers,
- Ted
Powered by blists - more mailing lists