lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 29 May 2024 18:08:50 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
Cc: "Paul E. McKenney" <paulmck@...nel.org>, 
	John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>, Arnd Bergmann <arnd@...nel.org>, 
	linux-alpha@...r.kernel.org, Arnd Bergmann <arnd@...db.de>, 
	Richard Henderson <richard.henderson@...aro.org>, Ivan Kokshaysky <ink@...assic.park.msu.ru>, 
	Matt Turner <mattst88@...il.com>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Marc Zyngier <maz@...nel.org>, linux-kernel@...r.kernel.org, 
	Michael Cree <mcree@...on.net.nz>, Frank Scheiner <frank.scheiner@....de>
Subject: Re: [PATCH 00/14] alpha: cleanups for 6.10

On Wed, 29 May 2024 at 11:50, Maciej W. Rozycki <macro@...am.me.uk> wrote:
>
>              The only difference here is that with
> hardware read-modify-write operations atomicity for sub-word accesses is
> guaranteed by the ISA, however for software read-modify-write it has to be
> explictly coded using the usual load-locked/store-conditional sequence in
> a loop.

I have some bad news for you: the old alpha CPU's not only screwed up
the byte/word design, they _also_ screwed up the
load-locked/store-conditional.

You'd think that LL/SC would be done at a cacheline level, like any
sane person would do.

But no.

The 21064 actually did atomicity with an external pin on the bus, the
same way people used to do before caches even existed.

Yes, it has an internal L1 D$, but it is a write-through cache, and
clearly things like cache coherency weren't designed for. In fact,
LL/SC is even documented to not work in the external L2 cache
("Bcache" - don't ask me why the odd naming).

So LL/SC on the 21064 literally works on external memory.

Quoting the reference manual:

  "A.6 Load Locked and Store Conditional
  The 21064 provides the ability to perform locked memory accesses through
  the LDxL (Load_Locked) and STxC (Store_Conditional) cycle command pair.
  The LDxL command forces the 21064 to bypass the Bcache and request data
  directly from the external memory interface. The memory interface logic must
  set a special interlock flag as it returns the data, and may
optionally keep the
  locked address"

End result: a LL/SC pair is very very slow. It was incredibly slow
even for the time. I had benchmarks, I can't recall them, but I'd like
to say "hundreds of cycles". Maybe thousands.

So actual reliable byte operations are not realistically possible on
the early alpha CPU's. You can do them with LL/SC, sure, but
performance would be so horrendously bad that it would be just sad.

The 21064A had some "fast lock" mode which allows the data from the
LDQ_L to come from the Bcache. So it still isn't exactly fast, and it
still didn't work at CPU core speeds, but at least it worked with the
external cache.

Compilers will generate the sequence that DEC specified, which isn't
thread-safe.

In fact, it's worse than "not thread safe". It's not even safe on UP
with interrupts, or even signals in user space.

It's one of those "technically valid POSIX", since there's
"sig_atomic_t" and if you do any concurrent signal stuff you're
supposed to only use that type. But it's another of those "Yeah, you'd
better make sure your structure members are either 'int' or bigger, or
never accessed from signals or interrupts, or they might clobber
nearby values".

           Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ