linux-kernel - Re: [PATCH -tip 2/3] sched/wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150922122735.14f3c573@mschwide>
Date:	Tue, 22 Sep 2015 12:27:35 +0200
From:	Martin Schwidefsky <schwidefsky@...ibm.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org, Davidlohr Bueso <dbueso@...e.de>,
	heiko.carstens@...ibm.com
Subject: Re: [PATCH -tip 2/3] sched/wake_q: Relax to acquire semantics

On Mon, 21 Sep 2015 11:22:52 +0200
Martin Schwidefsky <schwidefsky@...ibm.com> wrote:

> On Fri, 18 Sep 2015 14:41:20 -0700
> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> 
> > On Tue, Sep 15, 2015 at 10:09:41AM -0700, Paul E. McKenney wrote:
> > > On Tue, Sep 15, 2015 at 06:30:28PM +0200, Peter Zijlstra wrote:
> > > > On Tue, Sep 15, 2015 at 08:34:48AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Sep 15, 2015 at 04:14:39PM +0200, Peter Zijlstra wrote:
> > > > > > On Tue, Sep 15, 2015 at 07:09:22AM -0700, Paul E. McKenney wrote:
> > > > > > > On Tue, Sep 15, 2015 at 02:48:00PM +0200, Peter Zijlstra wrote:
> > > > > > > > On Tue, Sep 15, 2015 at 05:41:42AM -0700, Paul E. McKenney wrote:
> > > > > > > > > > Never mind, the PPC people will implement this with lwsync and that is
> > > > > > > > > > very much not transitive IIRC.
> > > > > > > > > 
> > > > > > > > > I am probably lost on context, but...
> > > > > > > > > 
> > > > > > > > > It turns out that lwsync is transitive in special cases.  One of them
> > > > > > > > > is a series of release-acquire pairs, which can extend indefinitely.
> > > > > > > > > 
> > > > > > > > > Does that help in this case?
> > > > > > > > 
> > > > > > > > Probably not, but good to know. I still don't think we want to rely on
> > > > > > > > ACQUIRE/RELEASE being transitive in general though.
> > > > > > > 
> > > > > > > OK, I will bite...  Why not?
> > > > > > 
> > > > > > It would mean us reviewing all archs (again) and documenting it I
> > > > > > suppose. Which is of course entirely possible.
> > > > > > 
> > > > > > That said, I don't think the case at hand requires it, so lets postpone
> > > > > > this for now ;-)
> > > > > 
> > > > > True enough, but in my experience smp_store_release() and
> > > > > smp_load_acquire() are a -lot- easier to use than other barriers,
> > > > > and transitivity will help promote their use.  So...
> > > > > 
> > > > > All the TSO architectures (x86, s390, SPARC, HPPA, ...) support transitive
> > > > > smp_store_release()/smp_load_acquire() via their native ordering in
> > > > > combination with barrier() macros.  x86 with CONFIG_X86_PPRO_FENCE=y,
> > > > > which is not TSO, uses an mfence instruction.  Power supports this via
> > > > > lwsync's partial cumulativity.  ARM64 supports it in SMP via the new ldar
> > > > > and stlr instructions (in non-SMP, it uses barrier(), which suffices
> > > > > in that case).  IA64 supports this via total ordering of all release
> > > > > instructions in theory and by the actual full-barrier implementation
> > > > > in practice (and the fact that gcc emits st.rel and ld.acq instructions
> > > > > for volatile stores and loads).  All other architectures use smp_mb(),
> > > > > which is transitive.
> > > > > 
> > > > > Did I miss anything?
> > > > 
> > > > I think that about covers it.. the only odd duckling might be s390 which
> > > > is documented as TSO but recently grew smp_mb__{before,after}_atomic(),
> > > > which seems to confuse matters.
> > > 
> > > Fair point, adding Martin and Heiko on CC for their thoughts.
> 
> Well we always had the full memory barrier for the various versions of
> smp_mb__xxx, they just have moved around and renamed several times.
> 
> After discussing this with Heiko we came to the conclusion that we can use
> a simple barrier() for smp_mb__before_atomic() and smp_mb__after_atomic().
> 
> > > It looks like this applies to recent mainframes that have new atomic
> > > instructions, which, yes, might need something to make them work with
> > > fully transitive smp_load_acquire() and smp_store_release().
> > > 
> > > Martin, Heiko, the question is whether or not the current s390
> > > smp_store_release() and smp_load_acquire() can be transitive.
> > > For example, if all the Xi variables below are initially zero,
> > > is it possible for all the r0, r1, r2, ... rN variables to
> > > have the value 1 at the end of the test.
> > 
> > Right...  This time actually adding Martin and Heiko on CC...
> > 
> > 							Thanx, Paul
> > 
> > > CPU 0
> > > 	r0 = smp_load_acquire(&X0);
> > > 	smp_store_release(&X1, 1);
> > > 
> > > CPU 1
> > > 	r1 = smp_load_acquire(&X1);
> > > 	smp_store_release(&X2, 1);
> > > 
> > > CPU 2
> > > 	r2 = smp_load_acquire(&X2);
> > > 	smp_store_release(&X3, 1);
> > > 
> > > ...
> > > 
> > > CPU N
> > > 	rN = smp_load_acquire(&XN);
> > > 	smp_store_release(&X0, 1);
> > > 
> > > If smp_store_release() and smp_load_acquire() are transitive, the
> > > answer would be "no".
> 
> The answer is "no". Christian recently summarized what the principles of
> operation has to say about the CPU read / write behavior. If you consider
> the sequential order of instructions then
> 
> 1) reads are in order
> 2) writes are in order
> 3) reads can happen earlier
> 4) writes can happen later

Correction. The principles of operation states this:

"A storage-operand store specified by one instruction appears to precede
all storage-operand stores specified by conceptually subsequent instructions,
but it does not necessarily precede storage-operand fetches specified by
conceptually subsequent instructions. However, a storage-operand store
appears to precede a conceptually subsequent storage-operand fetch from the
same main-storage location."

As observed by other CPUs a write to one memory location can "overtake" a
read of another memory location if there is no explicit memory-barrier
between the load and the store instruction.

In the above example X0, X1, ... XN are different memory locations, so
architecturally the answer is "yes", all r0, r1, ... rN variables can have
the value of 1 after the test. I doubt that any existing machine will
show this behavior though.

> > > A similar litmus test involving atomics would be as follows, again
> > > with all Xi initially zero:
> > > 
> > > CPU 0
> > > 	atomic_inc(&X0);
> > > 	smp_store_release(&X1, 1);
> > > 
> > > CPU 1
> > > 	r1 = smp_load_acquire(&X1);
> > > 	smp_store_release(&X2, 1);
> > > 
> > > CPU 2
> > > 	r2 = smp_load_acquire(&X2);
> > > 	smp_store_release(&X3, 1);
> > > 
> > > ...
> > > 
> > > CPU N
> > > 	rN = smp_load_acquire(&XN);
> > > 	r0 = atomic_read(&X0);
> > > 
> > > Here, the question is whether r0 can be zero, but r1, r2, ... rN all
> > > being 1 at the end of the test.
> 
> r0 = 0 and all r1, r2, ... rN = 1 can not happen on s390.
 
This indeed can not happen as the atomic_inc is a store type reference
which precedes the smp_store_release store type reference.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/