linux-kernel - Re: [PATCH tip/core/rcu 1/9] rcu: Provide GP ordering in face of migrations and delays

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20171006191822.GI3521@linux.vnet.ibm.com>
Date:   Fri, 6 Oct 2017 12:18:22 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
        oleg@...hat.com
Subject: Re: [PATCH tip/core/rcu 1/9] rcu: Provide GP ordering in face of
 migrations and delays

On Fri, Oct 06, 2017 at 11:07:23AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 05, 2017 at 11:22:04AM -0700, Paul E. McKenney wrote:
> > Hmmm...  Here is what I was worried about:
> > 
> > 	C C-PaulEMcKenney-W+RWC4+2017-10-05
> > 
> > 	{
> > 	}
> > 
> > 	P0(int *a, int *x)
> > 	{
> > 		WRITE_ONCE(*a, 1);
> > 		smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> > 		WRITE_ONCE(*x, 1);
> > 	}
> > 
> > 	P1(int *x, int *y, spinlock_t *l)
> > 	{
> > 		r3 = READ_ONCE(*x);
> > 		smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> > 		spin_lock(l); /* Locking in complete(). */
> > 		WRITE_ONCE(*y, 1);
> > 		spin_unlock(l);
> > 	}
> > 
> > 	P2(int *y, int *b, spinlock_t *l)
> > 	{
> > 		spin_lock(l); /* Locking in wait_for_completion. */
> > 		r4 = READ_ONCE(*y);
> > 		spin_unlock(l);
> > 		r1 = READ_ONCE(*b);
> > 	}
> > 
> > 	P3(int *b, int *a)
> > 	{
> > 		WRITE_ONCE(*b, 1);
> > 		smp_mb();
> > 		r2 = READ_ONCE(*a);
> > 	}
> > 
> > 	exists (1:r3=1 /\ 2:r4=1 /\ 2:r1=0 /\ 3:r2=0)
> 
> /me goes and install this herd thing again.. I'm sure I had it running
> _somewhere_.. A well.
> 
> 	C C-PaulEMcKenney-W+RWC4+2017-10-05
> 
> 	{
> 	}
> 
> 	P0(int *a, int *x)
> 	{
> 		WRITE_ONCE(*a, 1);
> 		smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> 		WRITE_ONCE(*x, 1);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		r3 = READ_ONCE(*x);
> 		smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> 		smp_store_release(y, 1);
> 	}
> 
> 	P2(int *y, int *b)
> 	{
> 		r4 = smp_load_acquire(y);
> 		r1 = READ_ONCE(*b);
> 	}
> 
> 	P3(int *b, int *a)
> 	{
> 		WRITE_ONCE(*b, 1);
> 		smp_mb();
> 		r2 = READ_ONCE(*a);
> 	}
> 
> 	exists (1:r3=1 /\ 2:r4=1 /\ 2:r1=0 /\ 3:r2=0)
> 
> 
> Is what I was thinking of, I think that is the minimal ordering
> complete()/wait_for_completion() need to provide.

OK, I will bite...  What do the smp_store_release() and the
smp_load_acquire() correspond to?  I see just plain locking in
wait_for_completion() and complete().

> (also, that r# numbering confuses the hell out of me, its not related to
> P nor to the variables)

Yeah, it is random, sorry!!!

> Test C-PaulEMcKenney-W+RWC4+2017-10-05 Allowed
> States 15
> 1:r3=0; 2:r1=0; 2:r4=0; 3:r2=0;
> 1:r3=0; 2:r1=0; 2:r4=0; 3:r2=1;
> 1:r3=0; 2:r1=0; 2:r4=1; 3:r2=0;
> 1:r3=0; 2:r1=0; 2:r4=1; 3:r2=1;
> 1:r3=0; 2:r1=1; 2:r4=0; 3:r2=0;
> 1:r3=0; 2:r1=1; 2:r4=0; 3:r2=1;
> 1:r3=0; 2:r1=1; 2:r4=1; 3:r2=0;
> 1:r3=0; 2:r1=1; 2:r4=1; 3:r2=1;
> 1:r3=1; 2:r1=0; 2:r4=0; 3:r2=0;
> 1:r3=1; 2:r1=0; 2:r4=0; 3:r2=1;
> 1:r3=1; 2:r1=0; 2:r4=1; 3:r2=1;
> 1:r3=1; 2:r1=1; 2:r4=0; 3:r2=0;
> 1:r3=1; 2:r1=1; 2:r4=0; 3:r2=1;
> 1:r3=1; 2:r1=1; 2:r4=1; 3:r2=0;
> 1:r3=1; 2:r1=1; 2:r4=1; 3:r2=1;
> No
> Witnesses
> Positive: 0 Negative: 15
> Condition exists (1:r3=1 /\ 2:r4=1 /\ 2:r1=0 /\ 3:r2=0)
> Observation C-PaulEMcKenney-W+RWC4+2017-10-05 Never 0 15
> Time C-PaulEMcKenney-W+RWC4+2017-10-05 0.04
> Hash=f7f8ad6eab33e90718a394bcb021557d

But yes, looking closer, this corresponds to the rule of thumb about
non-rf relations and full memory barriers.  We have two non-rf relations
(P2->P3 and P3->P0), so we need two full barriers, one each between the
non-rf relations.

So I dropped that patch yesterday.  The main thing I was missing was
that there is no ordering-free fastpath in wait_for_completion() and
complete(): Each unconditionally acquires the lock.  So the smp_mb()
that I was trying to add doesn't need to be there.

							Thanx, Paul