lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgw3UErQuBuUOOfjzejGek6Cao1sSW4AosR9WPZ1dfyZg@mail.gmail.com>
Date: Wed, 18 Sep 2024 17:22:17 +0200
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Will Deacon <will@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Catalin Marinas <catalin.marinas@....com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire

On Wed, 18 Sept 2024 at 13:15, Christoph Lameter (Ampere) <cl@...two.org> wrote:
>
> Other arches do not have acquire / release and will create additional
> barriers in the fallback implementation of smp_load_acquire. So it needs
> to be an arch config option.

Actually, I looked at a few cases, and it doesn't really seem to be true.

For example, powerpc doesn't have a "native" acquire model, but both
smp_load_acquire() and smp_rmb() end up being  LWSYNC after the load
(which in the good case is a "lwsync" instruction, in bad case it's a
heavier "sync" instruction on older cores, but the point is that it's
the same thing for smp_rmb() and for smp_load_acquire()).

So on powerpc, smp_load_acquire() isn't any better than
"READ_ONCE()+smp_rmb()", but it also isn't any worse.

And at least alpha is the same - it doesn't have smp_load_acquire(),
and it falls back on a full memory barrier for that case - but that's
what smp_rmb() is too. However, because READ_ONCE() on alpha already
contains a smp_mb(), it turns out that on alpha having "READ_ONCE +
smp_rmb()" actually results in *two* barriers, while a
"smp_load_acquire()" is just one.

And obviously technically x86 doesn't have explicit acquire, but with
every load being an acquire, it's a no-op either way.

So on at least three very different architectures, smp_load_acquire()
is at least no worse than READ_ONCE() followed by a smp_rmb(). And on
alpha and arm64, it's better.

So it does look like making it conditional doesn't actually buy us
anything. We might as well just unconditionally use the
smp_load_acquire() over "READ_ONCE+smp_rmb".

Other random architectures from a quick look:

RISC-V technically turns smp_rmb() into a "fence r,r", while a
smp_load_acquire() ends up being a "fence r,rw", so technically the
fences are different. But honestly, any microarchitecture that makes
those two be different is just crazy garbage (there's never any valid
reason to move later writes up before earlier reads).

Loongarch has acquire and is better off with it.

parisc has acquire and is better off with it.

s390 and sparc64 are like x86, in that it's just a build barrier either way.

End result: let's just simplify the patch and make it entirely unconditional.

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ