[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180627141522.GA10533@andrea>
Date: Wed, 27 Jun 2018 16:15:22 +0200
From: Andrea Parri <andrea.parri@...rulasolutions.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
Alan Stern <stern@...land.harvard.edu>,
Will Deacon <will.deacon@....com>,
Boqun Feng <boqun.feng@...il.com>,
Nicholas Piggin <npiggin@...il.com>,
David Howells <dhowells@...hat.com>,
Jade Alglave <j.alglave@....ac.uk>,
Luc Maranget <luc.maranget@...ia.fr>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Akira Yokosawa <akiyks@...il.com>,
Daniel Lustig <dlustig@...dia.com>,
Jonathan Corbet <corbet@....net>,
Ingo Molnar <mingo@...hat.com>,
Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [PATCH] doc: Update wake_up() & co. memory-barrier guarantees
> So I'm not actually sure how many people rely on the RCsc transitive
> smp_mb() here. People certainly rely on the RELEASE semantics, and the
> code itself requires the store->load ordering, together that gives us
> the smp_mb() because that's simply the only barrier we have.
>
> And looking at smp_mb__after_spinlock() again, we really only need the
> RCsc thing for rq->lock, not for the wakeups. The wakeups really only
> need that RCpc RELEASE + store->load thing (which we don't have).
>
> So yes, smp_mb(), however the below still makes more sense to me, or am
> I just being obtuse again?
While trying to integrate these remarks into v1 and looking again at the
comment before smp_mb__after_spinlock(), I remembered a discussion where
Boqun suggested some improvements for this comment: so I wrote the commit
reported at the end of this email.
This raises the following two issues:
1) First, the problem of integrating the resulting comment into v1,
where I've been talking about _full_ barriers associated to the
wakeups fuctions but where these are actually implemented as:
spin_lock(s);
smp_mb__after_spinlock();
2) Second, the problem of formalizing the requirements described in
that comment (remark: informally, the LKMM currently states that
the sequence "spin_lock(s); smp_mb__after_spinlock();" generates
a full barrier; in particular, this sequence orders
{STORE,LOAD} -> {STORE,LOAD}
according to the current LKMM).
For (1), I could simply replace each occurrence of "executes a full memory
barrier" with "execute the sequence spin_lock(s); smp_mb__after_spinlock()";
I haven't really thought about (2) yet, but please notice that something as
simple as
let mb = [...] |
([W] ; po? ; [LKW] ; fencerel(After-spinlock) ; [R])
would _not_ guarantee "RCsc transitivity" ...
A different approach (that could solve both problems at once) would be to
follow the current formalization in LKMM and to modify the comment before
smp_mb__after_spinlock() accordingly (say, informally, "it's required that
that spin_lock(); smp_mb__after_spinlock() provides a full barrier").
Thoughts?
Andrea
>From c3648d5022bedcd356198efa65703e01541cbd3f Mon Sep 17 00:00:00 2001
From: Andrea Parri <andrea.parri@...rulasolutions.com>
Date: Wed, 27 Jun 2018 10:53:30 +0200
Subject: [PATCH 2/3] locking: Fix comment for smp_mb__after_spinlock()
Boqun reported that the snippet described in the header comment for
smp_mb__after_spinlock() can be misleading, because acquire/release
chains already provide us with the underlying guarantee (due to the
A-cumulativity of release).
This commit fixes the comment following Boqun's example in [1].
It's worth noting here that LKMM maintainers are currently actively
debating whether to enforce RCsc transitivity of locking operations
"by definition" [2]; should the guarantee be enforced in the future,
the requirement for smp_mb__after_spinlock() could be simplified to
include only the STORE->LOAD ordering requirement.
[1] http://lkml.kernel.org/r/20180312085600.aczjkpn73axzs2sb@tardis
[2] http://lkml.kernel.org/r/Pine.LNX.4.44L0.1711271553490.1424-100000@iolanthe.rowland.org
http://lkml.kernel.org/r/Pine.LNX.4.44L0.1806211322160.2381-100000@iolanthe.rowland.org
Reported-and-Suggested-by: Boqun Feng <<boqun.feng@...il.com>
Signed-off-by: Andrea Parri <andrea.parri@...rulasolutions.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Will Deacon <will.deacon@....com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
---
include/linux/spinlock.h | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 1e8a464358384..c74828fe8d75c 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -121,22 +121,22 @@ do { \
*
* - it must ensure the critical section is RCsc.
*
- * The latter is important for cases where we observe values written by other
- * CPUs in spin-loops, without barriers, while being subject to scheduling.
+ * The latter requirement guarantees that stores from two critical sections
+ * in different CPUs are ordered even outside the critical sections. As an
+ * example illustrating this property, consider the following snippet:
*
- * CPU0 CPU1 CPU2
+ * CPU0 CPU1 CPU2
*
- * for (;;) {
- * if (READ_ONCE(X))
- * break;
- * }
- * X=1
- * <sched-out>
- * <sched-in>
- * r = X;
+ * spin_lock(s); spin_lock(s); r2 = READ_ONCE(Y);
+ * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
+ * spin_unlock(s); r1 = READ_ONCE(X); r3 = READ_ONCE(X);
+ * WRITE_ONCE(Y, 1);
+ * spin_unlock(s);
*
- * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
- * we get migrated and CPU2 sees X==0.
+ * Without RCsc transitivity, it is allowed that CPU0's critical section
+ * precedes CPU1's critical section (r1=1) and that CPU2 observes CPU1's
+ * store to Y (r2=1) while it does not observe CPU0's store to X (r3=0),
+ * despite the presence of the smp_rmb().
*
* Since most load-store architectures implement ACQUIRE with an smp_mb() after
* the LL/SC loop, they need no further barriers. Similarly all our TSO
--
2.7.4
Powered by blists - more mailing lists