linux-kernel - Re: spin_lock implicit/explicit memory barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160810191757.GA4952@linux-80c1.suse>
Date:	Wed, 10 Aug 2016 12:17:57 -0700
From:	Davidlohr Bueso <dave@...olabs.net>
To:	Manfred Spraul <manfred@...orfullife.com>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Susanne Spraul <1vier1@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: spin_lock implicit/explicit memory barrier

On Wed, 10 Aug 2016, Manfred Spraul wrote:

>On 08/10/2016 02:05 AM, Benjamin Herrenschmidt wrote:
>>On Tue, 2016-08-09 at 20:52 +0200, Manfred Spraul wrote:
>>>Hi Benjamin, Hi Michael,
>>>
>>>regarding commit 51d7d5205d33 ("powerpc: Add smp_mb() to
>>>arch_spin_is_locked()"):
>>>
>>>For the ipc/sem code, I would like to replace the spin_is_locked() with
>>>a smp_load_acquire(), see:
>>>
>>>http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n367
>>>
>>>http://www.ozlabs.org/~akpm/mmots/broken-out/ipc-semc-fix-complex_count-vs-simple-op-race.patch
>>>
>>>To my understanding, I must now add a smp_mb(), otherwise it would be
>>>broken on PowerPC:
>>>
>>>The approach that the memory barrier is added into spin_is_locked()
>>>doesn't work because the code doesn't use spin_is_locked().
>>>
>>>Correct?
>>Right, otherwise you aren't properly ordered. The current powerpc locks provide
>>good protection between what's inside vs. what's outside the lock but not vs.
>>the lock *value* itself, so if, like you do in the sem code, use the lock
>>value as something that is relevant in term of ordering, you probably need
>>an explicit full barrier.

But the problem here is with spin_unlock_wait() (for ll/sc spin_lock) not seeing the
store that makes the lock visibly taken and both threads end up exiting out of sem_lock();
similar scenario to the spin_is_locked commit mentioned above, which is crossing of
locks.

Now that spin_unlock_wait() always implies at least an load-acquire barrier (for both
ticket and qspinlocks, which is still x86 only), we wait on the full critical region.

So this patch takes this locking scheme:

   CPU0			      CPU1
   spin_lock(l)		      spin_lock(L)
   spin_unlock_wait(L)	      if (spin_is_locked(l))
   foo()			 foo()

... and converts it now to:

   CPU0			      CPU1
   complex_mode = true	      spin_lock(l)
   smp_mb()				  <--- do we want a smp_mb() here?
   spin_unlock_wait(l)	      if (!smp_load_acquire(complex_mode))
   foo()			 foo()

We should not be doing an smp_mb() right after a spin_lock(), makes no sense. The
spinlock machinery should guarantee us the barriers in the unorthodox locking cases,
such as this.

Thanks,
Davidlohr