lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Jul 2021 18:20:22 +0100
From:   Jade Alglave <j.alglave@....ac.uk>
To:     Will Deacon <will@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Alan Stern <stern@...land.harvard.edu>,
        Segher Boessenkool <segher@...nel.crashing.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Andrea Parri <parri.andrea@...il.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Nick Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Akira Yokosawa <akiyks@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-toolchains@...r.kernel.org,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [RFC] LKMM: Add volatile_if()

Dear all,

On Mon, Jun 07, 2021 at 12:52:35PM +0100, Will Deacon wrote:
> On Mon, Jun 07, 2021 at 12:43:01PM +0200, Peter Zijlstra wrote:
> > On Sun, Jun 06, 2021 at 11:43:42AM -0700, Linus Torvalds wrote:
> > > So while the example code is insane and pointless (and you shouldn't
> > > read *too* much into it), conceptually the notion of that pattern of
> > > 
> > >     if (READ_ONCE(a)) {
> > >         WRITE_ONCE(b,1);
> > >         .. do something ..
> > >     } else {
> > >         WRITE_ONCE(b,1);
> > >         .. do something else ..
> > >     }
> > 
> > This is actually more tricky than it would appear (isn't it always).
> > 
> > The thing is, that normally we must avoid speculative stores, because
> > they'll result in out-of-thin-air values.
> > 
> > *Except* in this case, where both branches emit the same store, then
> > it's a given that the store will happen and it will not be OOTA.
> > Someone's actually done the proof for that apparently (Will, you have a
> > reference to Jade's paper?)
> 
> I don't think there's a paper on this, but Jade and I are hoping to talk
> about aspects of it at LPC (assuming the toolchain MC gets accepted).
> 
> > There's apparently also a competition going on who can build the
> > weakestest ARM64 implementation ever.
> > 
> > Combine the two, and you'll get a CPU that *will* emit the store early
> > :/
> 
> So there are a lot of important details missing here and, as above, I think
> this is something worth discussing at LPC with Jade. The rough summary is
> that the arm64 memory model recently (so recently that it's not yet landed
> in the public docs) introduced something called "pick dependencies", which
> are a bit like control dependencies only they don't create order to all
> subsequent stores. These are useful for some conditional data-processing
> instructions such as CSEL and CAS, but it's important to note here that
> *conditional branch instructions behave exactly as you would expect*.
> 
> <disclaimer; I don't work for Arm so any mistakes here are mine>
> 
> To reiterate, in the code sequence at the top of this mail, if the compiler
> emits something along the lines of:
> 
> 	LDR
> 	<conditional branch instruction>
> 	STR
> 
> then the load *will* be ordered before the store, even if the same store
> instruction is executed regardless of the branch direction. Yes, one can
> fantasize about a CPU that executes both taken and non-taken paths and
> figures out that the STR can be hoisted before the load, but that is not
> allowed by the architecture today.
> 
> It's the conditional instructions that are more fun. For example, the CSEL
> instruction:
> 
> 	CSEL	X0, X1, X2, <cond>
> 
> basically says:
> 
> 	if (cond)
> 		X0 = X1;
> 	else
> 		X0 = X2;
> 
> these are just register-register operations, but the idea is that the CPU
> can predict that "branching event" inside the CSEL instruction and
> speculatively rename X0 while waiting for the condition to resolve.
> 
> So then you can add loads and stores to the mix along the lines of:
> 
> 	LDR	X0, [X1]		// X0 = *X1
> 	CMP	X0, X2
> 	CSEL	X3, X4, X5, EQ		// X3 = (X0 == X2) ? X4 : X5
> 	STR	X3, [X6]		// MUST BE ORDERED AFTER THE LOAD
> 	STR	X7, [X8]		// Can be reordered
> 
> (assuming X1, X6, X8 all point to different locations in memory)
> 
> So now we have a dependency from the load to the first store, but the
> interesting part is that the last store is _not_ ordered wrt either of the
> other two memory accesses, whereas it would be if we used a conditional
> branch instead of the CSEL. Make sense?
> 
> Now, obviously the compiler is blissfully unaware that conditional
> data processing instructions can give rise to dependencies than
> conditional branches, so the question really is how much do we need to
> care in the kernel?
> 
> My preference is to use load-acquire instead of control dependencies so
> that we don't have to worry about this, or any future relaxations to the
> CPU architecture, at all.
> 
> Jade -- please can you correct me if I got any of this wrong?
> 
Sincere apologies in taking so long to reply. I attach a technical
report which describes the status of dependencies in the Arm memory
model. 

I have also released the corresponding cat files and a collection of
interesting litmus tests over here:
https://github.com/herd/herdtools7/commit/f80bd7c2e49d7d3adad22afc62ff4768d65bf830

I hope this material can help inform this conversation and I would love
to hear your thoughts.

Thanks,
Jade 

> Will

Download attachment "TheBalladOfDependencies.pdf" of type "application/pdf" (344939 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ