linux-kernel - Re: [PATCH v3] Avoid memory barrier in read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20241028141036.GA2008@willie-the-truck>
Date: Mon, 28 Oct 2024 14:10:37 +0000
From: Will Deacon <will@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
	"Christoph Lameter (Ampere)" <cl@...two.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Catalin Marinas <catalin.marinas@....com>,
	Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load
 acquire

On Wed, Oct 23, 2024 at 01:34:16PM -0700, Linus Torvalds wrote:
> On Wed, 23 Oct 2024 at 12:45, Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > Do we want to do the complementing patch and make write_seqcount_end()
> > use smp_store_release() ?
> >
> > I think at least ARM (the 32bit thing) has wmb but uses mb for
> > store_release. But I also think I don't really care about that.
> 
> So unlike the "acquire vs rmb", there are architectures where "wmb" is
> noticeably cheaper than a "store release".
> 
> Just as an example, on alpha, a "store release" is a full memory
> barrier followed by the store, because it needs to serialize previous
> loads too. But wmp_wmb() is lightweight.
> 
> Typically in traditional (pre acquire/release) architectures "wmb"
> only ordered the CPU write queues, so "wmb" has always been cheap
> pretty much everywhere.
> 
> And I *suspect* that alpha isn't the outlier in having a much cheaper
> wmb than store-release.
> 
> But yeah, it's kind of ugly how we now have three completely different
> orderings for seqcounts:
> 
>  - the initial load is done with the smp_read_acquire
> 
>  - the final load (the "retry") is done with a smp_rmb (because an
> acquire orders _subsequent_ loads, not the ones inside the lock: we'd
> actually want a "smp_load_release()", but such a thing doesn't exist)
> 
>  - the writer side uses smp_wmb
> 
> (and arguably there's a fourth pattern: the latching cases uses double
> smp_wmb, because it orders the sequence count wrt both preceding and
> subsequent stores)
> 
> Anyway, obviously on x86 (and s390) none of this matters.
> 
> On arm64, I _suspect_ they are mostly the same, but it's going to be
> very microarchitecture-dependent. Neither should be expensive, but wmb
> really is a fundamentally lightweight operation.

I agree here. An STLR additionally orders PO-prior loads on arm64, so
I'd stick with the wmb().

Will