linux-kernel - Re: Unlock-lock questions and the Linux Kernel Memory Model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171128095850.rhtnx6e2qxep5npa@hirez.programming.kicks-ass.net>
Date:   Tue, 28 Nov 2017 10:58:50 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Jade Alglave <j.alglave@....ac.uk>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Will Deacon <will.deacon@....com>,
        David Howells <dhowells@...hat.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Unlock-lock questions and the Linux Kernel Memory Model

On Mon, Nov 27, 2017 at 04:16:47PM -0500, Alan Stern wrote:
> This is essentially a repeat of an email I sent out before the
> Thanksgiving holiday, the assumption being that lack of any responses
> was caused by the holiday break.  (And this time the message is CC'ed
> to LKML, so there will be a public record of it.)

Right, thanks! No turkey days on this side of the pond, but it still got
lost in the noise, sorry about that.

> A few people have said they believe the Linux Kernel Memory Model
> should make unlock followed by lock (of the same variable) act as a
> write memory barrier.  In other words, they want the memory model to
> forbid the following litmus test:
> 
> 
> C unlock-lock-write-ordering-1
> 
> {}
> 
> P0(int *x, int *y, spinlock_t *s)
> {
> 	spin_lock(s);
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	WRITE_ONCE(*y, 1);
> 	spin_unlock(s);
> }
> 
> P1(int *x, int *y)
> {
> 	r1 = READ_ONCE(*y);
> 	smp_rmb();
> 	r2 = READ_ONCE(*x);
> }
> 
> exists (1:r1=1 /\ 1:r2=0)

ACK, that is the pattern I was using.

> Judging from this, it seems likely that they would want the memory
> model to forbid the next two litmus tests as well (since there's never
> any guarantee that some other CPU won't interpose a critical section
> between a given unlock and a following lock):
> 
> 
> C unlock-lock-write-ordering-2
> 
> {}
> 
> P0(int *x, spinlock_t *s)
> {
> 	spin_lock(s);
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> }
> 
> P1(int *x, int *y, spinlock_t *s)
> {
> 	spin_lock(s);
> 	r1 = READ_ONCE(*x);
> 	WRITE_ONCE(*y, 1);
> 	spin_unlock(s);
> }
> 
> P2(int *x, int *y)
> {
> 	r2 = READ_ONCE(*y);
> 	smp_rmb();
> 	r3 = READ_ONCE(*x);
> }
> 
> exists (1:r1=1 /\ 2:r2=1 /\ 2:r3=0)

Agreed, I would indeed also expect that to 'work'.

> C unlock-lock-write-ordering-3
> 
> {}
> 
> P0(int *x, int *y, int *z, spinlock_t *s)
> {
> 	spin_lock(s);
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	r1 = READ_ONCE(*z);
> 	WRITE_ONCE(*y, 1);
> 	spin_unlock(s);
> }
> 
> P1(int *x, int *z, spinlock_t *s)
> {
> 	spin_lock(s);
> 	r2 = READ_ONCE(*x);
> 	WRITE_ONCE(*z, 1);
> 	spin_unlock(s);
> }
> 
> P2(int *x, int *y)
> {
> 	r3 = READ_ONCE(*y);
> 	smp_rmb();
> 	r4 = READ_ONCE(*x);
> }
> 
> exists (0:r1=1 /\ 1:r2=1 /\ 2:r3=1 /\ 2:r4=0)

And this is the same except with one more link in the chain, right? I
would've put it in P3 or somesuch but no matter.

> The general justification is that all the architectures currently
> supported by the kernel do forbid these tests, and there is code in
> the kernel that depends on this behavior.  Thus the model should
> forbid them.  (Currently the model allows them, on the principle that
> ordering induced by a lock is visible only to CPUs that take the
> lock.)
> 
> On the other hand, whether RISC-V would forbid these tests is not
> clear.  Basically, it comes down to using an RCsc versus an RCpc
> approach for the locking primitives.

Not entirely, as Power is currently RCpc but still forbids these.

> Given that spin_lock() and spin_unlock() derive many of their
> properties from acting as an acquire and a release respectively, we can
> ask if the memory model should forbid the analogous release-acquire
> litmus test:

> C rel-acq-write-ordering-3
> 
> {}
> 
> P0(int *x, int *s, int *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_store_release(s, 1);
> 	r1 = smp_load_acquire(s);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	r2 = READ_ONCE(*y);
> 	smp_rmb();
> 	r3 = READ_ONCE(*x);
> }
> 
> exists (1:r2=1 /\ 1:r3=0)

Ideally yes, consistency is good.

> For that matter, what if we take the initial write to be the 
> store-release itself rather than something coming before it?
> 
> 
> C rel-acq-write-ordering-1
> 
> {}
> 
> P0(int *s, int *y)
> {
> 	smp_store_release(s, 1);
> 	r1 = smp_load_acquire(s);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *s, int *y)
> {
> 	r2 = READ_ONCE(*y);
> 	smp_rmb();
> 	r3 = READ_ONCE(*s);
> }
> 
> exists (1:r2=1 /\ 1:r3=0)

I expect it would take extra work to make this one fail while the
previous one works, no?

That is, given the things so far, it would only be consistent for this
to also work.

> And going to extremes, what if the load-acquire reads from a different 
> variable, not the one written by the store-release?  This would be 
> like unlocking s and then locking t:
> 
> 
> C rel-acq-write-ordering-2
> 
> {}
> 
> P0(int *s, int *t, int *y)
> {
> 	smp_store_release(s, 1);
> 	r1 = smp_load_acquire(t);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *s, int *y)
> {
> 	r2 = READ_ONCE(*y);
> 	smp_rmb();
> 	r3 = READ_ONCE(*s);
> }
> 
> exists (1:r2=1 /\ 1:r3=0)
> 
> 
> The architectures currently supported by the kernel all forbid this 
> last test, but whether we want the model to forbid it is a lot more 
> questionable.
> 
> I (and others!) would like to know people's opinions on these matters.

This one is interesting... Yes it will work on current hardware, but I'm
not sure I'd expect this. I'll ponder this a wee bit more.