lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210607115234.GA7205@willie-the-truck>
Date:   Mon, 7 Jun 2021 12:52:35 +0100
From:   Will Deacon <will@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Alan Stern <stern@...land.harvard.edu>,
        Segher Boessenkool <segher@...nel.crashing.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Andrea Parri <parri.andrea@...il.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Nick Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Akira Yokosawa <akiyks@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-toolchains@...r.kernel.org,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [RFC] LKMM: Add volatile_if()

On Mon, Jun 07, 2021 at 12:43:01PM +0200, Peter Zijlstra wrote:
> On Sun, Jun 06, 2021 at 11:43:42AM -0700, Linus Torvalds wrote:
> > So while the example code is insane and pointless (and you shouldn't
> > read *too* much into it), conceptually the notion of that pattern of
> > 
> >     if (READ_ONCE(a)) {
> >         WRITE_ONCE(b,1);
> >         .. do something ..
> >     } else {
> >         WRITE_ONCE(b,1);
> >         .. do something else ..
> >     }
> 
> This is actually more tricky than it would appear (isn't it always).
> 
> The thing is, that normally we must avoid speculative stores, because
> they'll result in out-of-thin-air values.
> 
> *Except* in this case, where both branches emit the same store, then
> it's a given that the store will happen and it will not be OOTA.
> Someone's actually done the proof for that apparently (Will, you have a
> reference to Jade's paper?)

I don't think there's a paper on this, but Jade and I are hoping to talk
about aspects of it at LPC (assuming the toolchain MC gets accepted).

> There's apparently also a competition going on who can build the
> weakestest ARM64 implementation ever.
> 
> Combine the two, and you'll get a CPU that *will* emit the store early
> :/

So there are a lot of important details missing here and, as above, I think
this is something worth discussing at LPC with Jade. The rough summary is
that the arm64 memory model recently (so recently that it's not yet landed
in the public docs) introduced something called "pick dependencies", which
are a bit like control dependencies only they don't create order to all
subsequent stores. These are useful for some conditional data-processing
instructions such as CSEL and CAS, but it's important to note here that
*conditional branch instructions behave exactly as you would expect*.

<disclaimer; I don't work for Arm so any mistakes here are mine>

To reiterate, in the code sequence at the top of this mail, if the compiler
emits something along the lines of:

	LDR
	<conditional branch instruction>
	STR

then the load *will* be ordered before the store, even if the same store
instruction is executed regardless of the branch direction. Yes, one can
fantasize about a CPU that executes both taken and non-taken paths and
figures out that the STR can be hoisted before the load, but that is not
allowed by the architecture today.

It's the conditional instructions that are more fun. For example, the CSEL
instruction:

	CSEL	X0, X1, X2, <cond>

basically says:

	if (cond)
		X0 = X1;
	else
		X0 = X2;

these are just register-register operations, but the idea is that the CPU
can predict that "branching event" inside the CSEL instruction and
speculatively rename X0 while waiting for the condition to resolve.

So then you can add loads and stores to the mix along the lines of:

	LDR	X0, [X1]		// X0 = *X1
	CMP	X0, X2
	CSEL	X3, X4, X5, EQ		// X3 = (X0 == X2) ? X4 : X5
	STR	X3, [X6]		// MUST BE ORDERED AFTER THE LOAD
	STR	X7, [X8]		// Can be reordered

(assuming X1, X6, X8 all point to different locations in memory)

So now we have a dependency from the load to the first store, but the
interesting part is that the last store is _not_ ordered wrt either of the
other two memory accesses, whereas it would be if we used a conditional
branch instead of the CSEL. Make sense?

Now, obviously the compiler is blissfully unaware that conditional
data processing instructions can give rise to dependencies than
conditional branches, so the question really is how much do we need to
care in the kernel?

My preference is to use load-acquire instead of control dependencies so
that we don't have to worry about this, or any future relaxations to the
CPU architecture, at all.

Jade -- please can you correct me if I got any of this wrong?

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ