[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241028141036.GA2008@willie-the-truck>
Date: Mon, 28 Oct 2024 14:10:37 +0000
From: Will Deacon <will@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
"Christoph Lameter (Ampere)" <cl@...two.org>,
Thomas Gleixner <tglx@...utronix.de>,
Catalin Marinas <catalin.marinas@....com>,
Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load
acquire
On Wed, Oct 23, 2024 at 01:34:16PM -0700, Linus Torvalds wrote:
> On Wed, 23 Oct 2024 at 12:45, Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > Do we want to do the complementing patch and make write_seqcount_end()
> > use smp_store_release() ?
> >
> > I think at least ARM (the 32bit thing) has wmb but uses mb for
> > store_release. But I also think I don't really care about that.
>
> So unlike the "acquire vs rmb", there are architectures where "wmb" is
> noticeably cheaper than a "store release".
>
> Just as an example, on alpha, a "store release" is a full memory
> barrier followed by the store, because it needs to serialize previous
> loads too. But wmp_wmb() is lightweight.
>
> Typically in traditional (pre acquire/release) architectures "wmb"
> only ordered the CPU write queues, so "wmb" has always been cheap
> pretty much everywhere.
>
> And I *suspect* that alpha isn't the outlier in having a much cheaper
> wmb than store-release.
>
> But yeah, it's kind of ugly how we now have three completely different
> orderings for seqcounts:
>
> - the initial load is done with the smp_read_acquire
>
> - the final load (the "retry") is done with a smp_rmb (because an
> acquire orders _subsequent_ loads, not the ones inside the lock: we'd
> actually want a "smp_load_release()", but such a thing doesn't exist)
>
> - the writer side uses smp_wmb
>
> (and arguably there's a fourth pattern: the latching cases uses double
> smp_wmb, because it orders the sequence count wrt both preceding and
> subsequent stores)
>
> Anyway, obviously on x86 (and s390) none of this matters.
>
> On arm64, I _suspect_ they are mostly the same, but it's going to be
> very microarchitecture-dependent. Neither should be expensive, but wmb
> really is a fundamentally lightweight operation.
I agree here. An STLR additionally orders PO-prior loads on arm64, so
I'd stick with the wmb().
Will
Powered by blists - more mailing lists