linux-kernel - Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171102171644.GD595@arm.com>
Date:   Thu, 2 Nov 2017 17:16:44 +0000
From:   Will Deacon <will.deacon@....com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Reshetova, Elena" <elena.reshetova@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "ishkamiel@...il.com" <ishkamiel@...il.com>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        parri.andrea@...il.com, boqun.feng@...il.com, dhowells@...hat.com,
        david@...morbit.com
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in
 atomic_t

On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote:
> On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> 
> > On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > > 
> > > > > Lock functions such as refcount_dec_and_lock() &
> > > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > > they atomic counterparts. 
> > > > 
> > > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > > later load/store's.
> > > 
> > > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > > returns false; 
> > 
> > It should provide a release:
> > 
> >  - if !=1, dec_not_one will provide release
> >  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
> >    dec_and_test will provide the release, even if the test fails and we
> >    unlock again it should still dec.
> > 
> > The one exception is when the counter is saturated, but in that case
> > we'll never free the object and the ordering is moot in any case.
> 
> Also if the counter is 0, but that will never happen if the 
> refcounting is correct.
> 
> > > it provides ordering only if the return value is true.  
> > > In which case it provides acquire ordering (thanks to the spin_lock),
> > > and both release ordering and a control dependency (thanks to the
> > > refcount_dec_and_test).
> > > 
> > > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > > have you got anything simple around?
> > > 
> > > The combination of acquire + release is not the same as smp_mb, because 
> > 
> > acquire+release is nothing, its release+acquire that I meant which
> > should order things locally, but now that you've got me looking at it
> > again, we don't in fact do that.
> > 
> > So refcount_dec_and_lock() will provide a release, irrespective of the
> > return value (assuming we're not saturated). If it returns true, it also
> > does an acquire for the lock.
> > 
> > But combined they're acquire+release, which is unfortunate.. it means
> > the lock section and the refcount stuff overlaps, but I don't suppose
> > that's actually a problem. Need to consider more.
> 
> Right.  To address your point: release + acquire isn't the same as a
> full barrier either.  The SB pattern illustrates the difference:
> 
> 	P0		P1
> 	Write x=1	Write y=1
> 	Release a	smp_mb
> 	Acquire b	Read x=0
> 	Read y=0
> 
> This would not be allowed if the release + acquire sequence was 
> replaced by smp_mb.  But as it stands, this is allowed because nothing 
> prevents the CPU from interchanging the order of the release and the 
> acquire -- and then you're back to the acquire + release case.
> 
> However, there is one circumstance where this interchange isn't 
> allowed: when the release and acquire access the same memory 
> location.  Thus:
> 
> 	P0(int *x, int *y, int *a)
> 	{
> 		int r0;
> 
> 		WRITE_ONCE(*x, 1);
> 		smp_store_release(a, 1);
> 		smp_load_acquire(a);
> 		r0 = READ_ONCE(*y);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*y, 1);
> 		smp_mb();
> 		r1 = READ_ONCE(*x);
> 	}
> 
> 	exists (0:r0=0 /\ 1:r1=0)
> 
> This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> were replaced by a similar release/acquire pair for the same memory 
> location.

Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain
store and load-acquire to plain load? All we're saying is that you can forward
from a release to an acquire, which is fine for RCpc semantics.

e.g.

X86 SB+mfence+po-rfi-po
"MFencedWR Fre PodWW Rfi PodRR Fre"
Generator=diyone7 (version 7.46+3)
Prefetch=0:x=F,0:y=T,1:y=F,1:x=T
Com=Fr Fr
Orig=MFencedWR Fre PodWW Rfi PodRR Fre
{
}
 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MFENCE      | MOV [z],$1  ;
 MOV EAX,[y] | MOV EAX,[z] ;
             | MOV EBX,[x] ;
exists
(0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)

which herd says is allowed:

Test SB+mfence+po-rfi-po Allowed
States 4
0:EAX=0; 1:EAX=1; 1:EBX=0;
0:EAX=0; 1:EAX=1; 1:EBX=1;
0:EAX=1; 1:EAX=1; 1:EBX=0;
0:EAX=1; 1:EAX=1; 1:EBX=1;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
Observation SB+mfence+po-rfi-po Sometimes 1 3
Time SB+mfence+po-rfi-po 0.00
Hash=0f983e2d7579e5c04c332f9ac620c31f

and I can reproduce using litmus to actually run it on my x86 box:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for SB+mfence+po-rfi-po.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
X86 SB+mfence+po-rfi-po
"MFencedWR Fre PodWW Rfi PodRR Fre"

{}

 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MFENCE      | MOV [z],$1  ;
 MOV EAX,[y] | MOV EAX,[z] ;
             | MOV EBX,[x] ;

exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
Generated assembler
#START _litmus_P1
	movl $1,(%r8,%rcx)
	movl $1,(%r9,%rcx)
	movl (%r9,%rcx),%eax
	movl (%rdi,%rcx),%edx
#START _litmus_P0
	movl $1,(%rdx,%rcx)
	mfence
	movl (%rdi,%rcx),%eax

Test SB+mfence+po-rfi-po Allowed
Histogram (4 states)
8     *>0:EAX=0; 1:EAX=1; 1:EBX=0;
1999851:>0:EAX=1; 1:EAX=1; 1:EBX=0;
1999549:>0:EAX=0; 1:EAX=1; 1:EBX=1;
592   :>0:EAX=1; 1:EAX=1; 1:EBX=1;
Ok

Witnesses
Positive: 8, Negative: 3999992
Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated
Hash=0f983e2d7579e5c04c332f9ac620c31f
Generator=diyone7 (version 7.46+3)
Com=Fr Fr
Orig=MFencedWR Fre PodWW Rfi PodRR Fre
Observation SB+mfence+po-rfi-po Sometimes 8 3999992
Time SB+mfence+po-rfi-po 0.17

Will