linux-kernel - Re: [PATCH v3] Avoid memory barrier in read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wi=Ji6-xi32167i3M1JL_YyRj6tgUAJS=YQ94GKzMBvkg@mail.gmail.com>
Date: Wed, 23 Oct 2024 13:34:16 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Christoph Lameter (Ampere)" <cl@...two.org>, Will Deacon <will@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Catalin Marinas <catalin.marinas@....com>, Ingo Molnar <mingo@...hat.com>, 
	Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire

On Wed, 23 Oct 2024 at 12:45, Peter Zijlstra <peterz@...radead.org> wrote:
>
> Do we want to do the complementing patch and make write_seqcount_end()
> use smp_store_release() ?
>
> I think at least ARM (the 32bit thing) has wmb but uses mb for
> store_release. But I also think I don't really care about that.

So unlike the "acquire vs rmb", there are architectures where "wmb" is
noticeably cheaper than a "store release".

Just as an example, on alpha, a "store release" is a full memory
barrier followed by the store, because it needs to serialize previous
loads too. But wmp_wmb() is lightweight.

Typically in traditional (pre acquire/release) architectures "wmb"
only ordered the CPU write queues, so "wmb" has always been cheap
pretty much everywhere.

And I *suspect* that alpha isn't the outlier in having a much cheaper
wmb than store-release.

But yeah, it's kind of ugly how we now have three completely different
orderings for seqcounts:

 - the initial load is done with the smp_read_acquire

 - the final load (the "retry") is done with a smp_rmb (because an
acquire orders _subsequent_ loads, not the ones inside the lock: we'd
actually want a "smp_load_release()", but such a thing doesn't exist)

 - the writer side uses smp_wmb

(and arguably there's a fourth pattern: the latching cases uses double
smp_wmb, because it orders the sequence count wrt both preceding and
subsequent stores)

Anyway, obviously on x86 (and s390) none of this matters.

On arm64, I _suspect_ they are mostly the same, but it's going to be
very microarchitecture-dependent. Neither should be expensive, but wmb
really is a fundamentally lightweight operation.

On 32-bit arm, wmb should be cheaper ("ishst" only waits for earlier stores).

On powerpc, wmb is cheaper on older CPU's (eieio vs sync), but the
same on newer CPUs (lwsync).

On alpha, wmb is definitely cheaper, but I doubt anybody really cares.

Others? I stopped looking, and am not familiar enough.

          Linus