[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi=Ji6-xi32167i3M1JL_YyRj6tgUAJS=YQ94GKzMBvkg@mail.gmail.com>
Date: Wed, 23 Oct 2024 13:34:16 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Christoph Lameter (Ampere)" <cl@...two.org>, Will Deacon <will@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Catalin Marinas <catalin.marinas@....com>, Ingo Molnar <mingo@...hat.com>,
Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire
On Wed, 23 Oct 2024 at 12:45, Peter Zijlstra <peterz@...radead.org> wrote:
>
> Do we want to do the complementing patch and make write_seqcount_end()
> use smp_store_release() ?
>
> I think at least ARM (the 32bit thing) has wmb but uses mb for
> store_release. But I also think I don't really care about that.
So unlike the "acquire vs rmb", there are architectures where "wmb" is
noticeably cheaper than a "store release".
Just as an example, on alpha, a "store release" is a full memory
barrier followed by the store, because it needs to serialize previous
loads too. But wmp_wmb() is lightweight.
Typically in traditional (pre acquire/release) architectures "wmb"
only ordered the CPU write queues, so "wmb" has always been cheap
pretty much everywhere.
And I *suspect* that alpha isn't the outlier in having a much cheaper
wmb than store-release.
But yeah, it's kind of ugly how we now have three completely different
orderings for seqcounts:
- the initial load is done with the smp_read_acquire
- the final load (the "retry") is done with a smp_rmb (because an
acquire orders _subsequent_ loads, not the ones inside the lock: we'd
actually want a "smp_load_release()", but such a thing doesn't exist)
- the writer side uses smp_wmb
(and arguably there's a fourth pattern: the latching cases uses double
smp_wmb, because it orders the sequence count wrt both preceding and
subsequent stores)
Anyway, obviously on x86 (and s390) none of this matters.
On arm64, I _suspect_ they are mostly the same, but it's going to be
very microarchitecture-dependent. Neither should be expensive, but wmb
really is a fundamentally lightweight operation.
On 32-bit arm, wmb should be cheaper ("ishst" only waits for earlier stores).
On powerpc, wmb is cheaper on older CPU's (eieio vs sync), but the
same on newer CPUs (lwsync).
On alpha, wmb is definitely cheaper, but I doubt anybody really cares.
Others? I stopped looking, and am not familiar enough.
Linus
Powered by blists - more mailing lists