lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 28 Apr 2015 11:53:21 -0400 From: Chris Metcalf <cmetcalf@...hip.com> To: Peter Zijlstra <peterz@...radead.org>, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> CC: Manfred Spraul <manfred@...orfullife.com>, Oleg Nesterov <oleg@...hat.com>, Kirill Tkhai <ktkhai@...allels.com>, <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>, Josh Poimboeuf <jpoimboe@...hat.com> Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles On 04/28/2015 10:33 AM, Peter Zijlstra wrote: > On Sun, Apr 26, 2015 at 03:52:13AM -0700, Paul E. McKenney wrote: > >> And then an smp_read_barrier_depends() would be needed either here >> or embedded in apin_unlock_wait(). But we also need to check the >> spin_unlock_wait() implementations to see if any are potentially >> vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(), >> READ_ONCE(), or other sources of the required volatility: >> >> o tile: For 32-bit, looks like a bug. Compares ->current_ticket and >> ->next_ticket with no obvious protection. The compiler is free to >> load them in either order, so it is possible that the two fields >> could compare equal despite never having actually been equal at >> any given time. Needs something like arm, arm64, mips, or x86 >> to do single fetch, then compare fields in quantity fetched. >> >> Except that this appears to be using int on a 32-bit system, >> thus might not have a 64-bit load. If that is the case, the >> trick would be to load them in order. Except that this can be >> defeated by overflow. Are there really 32-bit tile systems with >> enough CPUs to overflow an unsigned short? As you surmise, tilepro doesn't have 64-bit loads. So we are stuck with 32-bit loads on these two fields. It's true that spin_unlock_wait() can therefore falsely claim that the lock is unlocked, but it should be only a hint anyway, since by the time the caller tries to act on that information the lock may have been retaken anyway, right? If spin_unlock_wait() is really trying to guarantee that the lock was available at some point in the interval between when it was called and when it returned, we could use READ_ONCE() to read the current ticket value first; is that a necessary part of the semantics? (Even with READ_ONCE we'd still be exposed to a technical risk that others cores had taken and released the lock 2 billion times in between the two loads of the core running spin_unlock_wait, without ever having the lock actually be free, so technically the only solution is for that core to actually acquire and release the lock, but that seems a bit extreme in practice.) The reason we use two 32-bit fields on tilepro is that the only available atomic instruction is tns (test and set), which sets a 32-bit "1" value into the target memory and returns the old 32-bit value. So we need to be able to safely "corrupt" the next_ticket value with a "1", load the current_ticket value, and if they don't match, rewrite next_ticket with its old value. We can't safely do this if next_ticket and current_ticket are 16-bit fields in one 32-bit word, since the "tns" operation would corrupt the current_ticket value in that case, making someone waiting on ticket 0 think they owned the lock. On tilegx we made the atomic instruction situation much, much better :-) >> For 64-bit, a READ_ONCE() appears to be in order -- no obvious >> volatility present. >> It depends, I guess. If you are spinning on arch_spin_is_locked(), yes, you need to make sure to do something to ensure the value is re-read. But arch_spin_unlock_wait() already calls delay_backoff(), which calls relax(), which includes a barrier(), so we're OK there. But if stylistically the consensus calls for a READ_ONCE() in arch_spin_is_locked(), I can certainly add that. What do folks think? Assuming the answers to both questions is "change the code", how does this look? diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h index c0a77b38d39a..7c7b80bd83db 100644 --- a/arch/tile/include/asm/spinlock_32.h +++ b/arch/tile/include/asm/spinlock_32.h @@ -41,8 +41,23 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock) * to claim the lock is held, since it will be momentarily * if not already. There's no need to wait for a "valid" * lock->next_ticket to become available. + * + * We order the reads here so that if we claim the lock is + * unlocked, we know it actually was for at least a moment. + * Since current_ticket is never incremented above + * next_ticket, by reading current first, then next, and + * finding them equal, we know that during that window between + * the reads the lock was unlocked. + * + * There is a technical risk in this that between reading + * current and reading next, other cores locked and unlocked + * two billion times without the lock ever being unlocked, and + * therefore it looks like the lock was at some point unlocked + * but it never was. But this seems highly improbable. */ - return lock->next_ticket != lock->current_ticket; + int current = READ_ONCE(lock->current_ticket); + int next = READ_ONCE(lock->next_ticket); + return next != current; } void arch_spin_lock(arch_spinlock_t *lock); diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h index 9a12b9c7e5d3..b9718fb4e74a 100644 --- a/arch/tile/include/asm/spinlock_64.h +++ b/arch/tile/include/asm/spinlock_64.h @@ -18,6 +18,8 @@ #ifndef _ASM_TILE_SPINLOCK_64_H #define _ASM_TILE_SPINLOCK_64_H +#include <linux/compiler.h> + /* Shifts and masks for the various fields in "lock". */ #define __ARCH_SPIN_CURRENT_SHIFT 17 #define __ARCH_SPIN_NEXT_MASK 0x7fff @@ -44,7 +46,8 @@ static inline u32 arch_spin_next(u32 val) /* The lock is locked if a task would have to wait to get it. */ static inline int arch_spin_is_locked(arch_spinlock_t *lock) { - u32 val = lock->lock; + /* Use READ_ONCE() to ensure that calling this in a loop is OK. */ + u32 val = READ_ONCE(lock->lock); return arch_spin_current(val) != arch_spin_next(val); } -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists