linux-kernel - Re: [PATCH 2/2] [PATCH] sched: Add smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <553FACF1.2020405@ezchip.com>
Date:	Tue, 28 Apr 2015 11:53:21 -0400
From:	Chris Metcalf <cmetcalf@...hip.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
CC:	Manfred Spraul <manfred@...orfullife.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Kirill Tkhai <ktkhai@...allels.com>,
	<linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
	Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles

On 04/28/2015 10:33 AM, Peter Zijlstra wrote:
> On Sun, Apr 26, 2015 at 03:52:13AM -0700, Paul E. McKenney wrote:
>
>> And then an smp_read_barrier_depends() would be needed either here
>> or embedded in apin_unlock_wait().  But we also need to check the
>> spin_unlock_wait() implementations to see if any are potentially
>> vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(),
>> READ_ONCE(), or other sources of the required volatility:
>>
>> o	tile: For 32-bit, looks like a bug.  Compares ->current_ticket and
>> 	->next_ticket with no obvious protection.  The compiler is free to
>> 	load them in either order, so it is possible that the two fields
>> 	could compare equal despite never having actually been equal at
>> 	any given time.  Needs something like arm, arm64, mips, or x86
>> 	to do single fetch, then compare fields in quantity fetched.
>>
>> 	Except that this appears to be using int on a 32-bit system,
>> 	thus might not have a 64-bit load.  If that is the case, the
>> 	trick would be to load them in order.  Except that this can be
>> 	defeated by overflow.  Are there really 32-bit tile systems with
>> 	enough CPUs to overflow an unsigned short?

As you surmise, tilepro doesn't have 64-bit loads.  So we are stuck with
32-bit loads on these two fields.  It's true that spin_unlock_wait() can
therefore falsely claim that the lock is unlocked, but it should be only a
hint anyway, since by the time the caller tries to act on that information
the lock may have been retaken anyway, right?  If spin_unlock_wait() is
really trying to guarantee that the lock was available at some point in
the interval between when it was called and when it returned, we could use
READ_ONCE() to read the current ticket value first; is that a necessary
part of the semantics?

(Even with READ_ONCE we'd still be exposed to a technical risk that
others cores had taken and released the lock 2 billion times in between
the two loads of the core running spin_unlock_wait, without ever having
the lock actually be free, so technically the only solution is for that core
to actually acquire and release the lock, but that seems a bit extreme
in practice.)

The reason we use two 32-bit fields on tilepro is that the only available
atomic instruction is tns (test and set), which sets a 32-bit "1" value
into the target memory and returns the old 32-bit value.  So we need to
be able to safely "corrupt" the next_ticket value with a "1", load the
current_ticket value, and if they don't match, rewrite next_ticket with
its old value.  We can't safely do this if next_ticket and current_ticket
are 16-bit fields in one 32-bit word, since the "tns" operation would
corrupt the current_ticket value in that case, making someone
waiting on ticket 0 think they owned the lock.

On tilegx we made the atomic instruction situation much, much better :-)

>> For 64-bit, a READ_ONCE() appears to be in order -- no obvious
>> 	volatility present.
>>

It depends, I guess.  If you are spinning on arch_spin_is_locked(), yes,
you need to make sure to do something to ensure the value is re-read.
But arch_spin_unlock_wait() already calls delay_backoff(), which calls
relax(), which includes a barrier(), so we're OK there.  But if 
stylistically
the consensus calls for a READ_ONCE() in arch_spin_is_locked(), I
can certainly add that.  What do folks think?

Assuming the answers to both questions is "change the code",
how does this look?

diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index c0a77b38d39a..7c7b80bd83db 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -41,8 +41,23 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
          * to claim the lock is held, since it will be momentarily
          * if not already.  There's no need to wait for a "valid"
          * lock->next_ticket to become available.
+        *
+        * We order the reads here so that if we claim the lock is
+        * unlocked, we know it actually was for at least a moment.
+        * Since current_ticket is never incremented above
+        * next_ticket, by reading current first, then next, and
+        * finding them equal, we know that during that window between
+        * the reads the lock was unlocked.
+        *
+        * There is a technical risk in this that between reading
+        * current and reading next, other cores locked and unlocked
+        * two billion times without the lock ever being unlocked, and
+        * therefore it looks like the lock was at some point unlocked
+        * but it never was.  But this seems highly improbable.
          */
-       return lock->next_ticket != lock->current_ticket;
+       int current = READ_ONCE(lock->current_ticket);
+       int next = READ_ONCE(lock->next_ticket);
+       return next != current;
  }
  
  void arch_spin_lock(arch_spinlock_t *lock);
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index 9a12b9c7e5d3..b9718fb4e74a 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -18,6 +18,8 @@
  #ifndef _ASM_TILE_SPINLOCK_64_H
  #define _ASM_TILE_SPINLOCK_64_H
  
+#include <linux/compiler.h>
+
  /* Shifts and masks for the various fields in "lock". */
  #define __ARCH_SPIN_CURRENT_SHIFT      17
  #define __ARCH_SPIN_NEXT_MASK          0x7fff
@@ -44,7 +46,8 @@ static inline u32 arch_spin_next(u32 val)
  /* The lock is locked if a task would have to wait to get it. */
  static inline int arch_spin_is_locked(arch_spinlock_t *lock)
  {
-       u32 val = lock->lock;
+       /* Use READ_ONCE() to ensure that calling this in a loop is OK. */
+       u32 val = READ_ONCE(lock->lock);
         return arch_spin_current(val) != arch_spin_next(val);
  }

-- 
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/