lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150919153310.GB20458@fixme-laptop.cn.ibm.com>
Date:	Sat, 19 Sep 2015 23:33:10 +0800
From:	Boqun Feng <boqun.feng@...il.com>
To:	Will Deacon <will.deacon@....com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Waiman Long <waiman.long@...com>
Subject: Re: [RFC v2 3/7] powerpc: atomic: Implement
 atomic{,64}_{add,sub}_return_* variants

Hi Will,

On Fri, Sep 18, 2015 at 05:59:02PM +0100, Will Deacon wrote:
> On Wed, Sep 16, 2015 at 04:49:31PM +0100, Boqun Feng wrote:
> > On powerpc, we don't need a general memory barrier to achieve acquire and
> > release semantics, so __atomic_op_{acquire,release} can be implemented
> > using "lwsync" and "isync".
> 
> I'm assuming isync+ctrl isn't transitive, so we need to get to the bottom

Actually the transitivity is still guaranteed here, I think ;-)

(Before I put my reasoning, I have to admit I just learned about the
cumulativity recently, so my reasoning may be wrong. But the good thing
is that we have our POWER experts in the CCed. In case I'm wrong, they
could correct me.)

The thing is, on POWER, transitivity is implemented by a similar but
slightly different concept, cumulativity, and as said in the link:

http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html

"""
The ordering done by a memory barrier is said to be “cumulative” if it
also orders storage accesses that are performed by processors and
mechanisms other than P1, as follows.

*	A includes all applicable storage accesses by any such processor
	or mechanism that have been performed with respect to P1 before
	the memory barrier is created.

*	B includes all applicable storage accesses by any such processor
	or mechanism that are performed after a Load instruction
	executed by that processor or mechanism has returned the value
	stored by a store that is in B.
"""

Please note that the set B can be extended indefinitely without any
other cumulative barrier.

So for a RELEASE+ACQUIRE pair to a same variable, as long as the barrier
in the RELEASE operation is cumumlative, the transitivity is guaranteed.
And lwsync is cumulative, so we are fine here.


I also wrote a herd litmus to test this. Due to the tool's limitation, I
use the xchg_release and xchg_acquire to test. And since herd doesn't
support backward branching, some tricks are used here to work around:


PPC lwsync+isync-transitivity
""
{
0:r1=1; 0:r2=x; 0:r3=1; 0:r10=0 ; 0:r11=0; 0:r12=a;
1:r1=9; 1:r2=x; 1:r3=1; 1:r10=0 ; 1:r11=0; 1:r12=a;
2:r1=9; 2:r2=x; 2:r3=2; 2:r10=0 ; 2:r11=0; 2:r12=a;
}
 P0           | P1                  | P2                  ;
 stw r1,0(r2) | lwz r1,0(r2)        |                     ;
              | lwsync              | lwarx r11, r10, r12 ;
              | lwarx  r11,r10,r12  | stwcx. r3, r10, r12 ;
              | stwcx. r3,r10,r12   | bne Fail            ;
              |                     | isync               ;
              |                     | lwz r1, 0(r2)       ;
              |                     | Fail:               ;

exists
(1:r1=1 /\ 1:r11=0 /\ 2:r11=1 /\ 2:r1 = 0)


And the result of this litmus is that:

Test lwsync+isync-transitivity Allowed
States 15
1:r1=0; 1:r11=0; 2:r1=0; 2:r11=0;
1:r1=0; 1:r11=0; 2:r1=0; 2:r11=1;
1:r1=0; 1:r11=0; 2:r1=1; 2:r11=0;
1:r1=0; 1:r11=0; 2:r1=1; 2:r11=1;
1:r1=0; 1:r11=0; 2:r1=9; 2:r11=0;
1:r1=0; 1:r11=0; 2:r1=9; 2:r11=1;
1:r1=0; 1:r11=2; 2:r1=0; 2:r11=0;
1:r1=0; 1:r11=2; 2:r1=1; 2:r11=0;
1:r1=1; 1:r11=0; 2:r1=0; 2:r11=0;
1:r1=1; 1:r11=0; 2:r1=1; 2:r11=0;
1:r1=1; 1:r11=0; 2:r1=1; 2:r11=1;
1:r1=1; 1:r11=0; 2:r1=9; 2:r11=0;
1:r1=1; 1:r11=0; 2:r1=9; 2:r11=1;
1:r1=1; 1:r11=2; 2:r1=0; 2:r11=0;
1:r1=1; 1:r11=2; 2:r1=1; 2:r11=0;
No
Witnesses
Positive: 0 Negative: 29
Condition exists (1:r1=1 /\ 1:r11=0 /\ 2:r11=1 /\ 2:r1=0)
Observation lwsync+isync-transitivity Never 0 29

,which means transitivity is guaranteed.

Regards,
Boqun

> of the s390 thread you linked me to before we start spreading this
> further:
> 
>   https://lkml.org/lkml/2015/9/15/836
> 
> Will

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ