[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <553FACF1.2020405@ezchip.com>
Date: Tue, 28 Apr 2015 11:53:21 -0400
From: Chris Metcalf <cmetcalf@...hip.com>
To: Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
CC: Manfred Spraul <manfred@...orfullife.com>,
Oleg Nesterov <oleg@...hat.com>,
Kirill Tkhai <ktkhai@...allels.com>,
<linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles
On 04/28/2015 10:33 AM, Peter Zijlstra wrote:
> On Sun, Apr 26, 2015 at 03:52:13AM -0700, Paul E. McKenney wrote:
>
>> And then an smp_read_barrier_depends() would be needed either here
>> or embedded in apin_unlock_wait(). But we also need to check the
>> spin_unlock_wait() implementations to see if any are potentially
>> vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(),
>> READ_ONCE(), or other sources of the required volatility:
>>
>> o tile: For 32-bit, looks like a bug. Compares ->current_ticket and
>> ->next_ticket with no obvious protection. The compiler is free to
>> load them in either order, so it is possible that the two fields
>> could compare equal despite never having actually been equal at
>> any given time. Needs something like arm, arm64, mips, or x86
>> to do single fetch, then compare fields in quantity fetched.
>>
>> Except that this appears to be using int on a 32-bit system,
>> thus might not have a 64-bit load. If that is the case, the
>> trick would be to load them in order. Except that this can be
>> defeated by overflow. Are there really 32-bit tile systems with
>> enough CPUs to overflow an unsigned short?
As you surmise, tilepro doesn't have 64-bit loads. So we are stuck with
32-bit loads on these two fields. It's true that spin_unlock_wait() can
therefore falsely claim that the lock is unlocked, but it should be only a
hint anyway, since by the time the caller tries to act on that information
the lock may have been retaken anyway, right? If spin_unlock_wait() is
really trying to guarantee that the lock was available at some point in
the interval between when it was called and when it returned, we could use
READ_ONCE() to read the current ticket value first; is that a necessary
part of the semantics?
(Even with READ_ONCE we'd still be exposed to a technical risk that
others cores had taken and released the lock 2 billion times in between
the two loads of the core running spin_unlock_wait, without ever having
the lock actually be free, so technically the only solution is for that core
to actually acquire and release the lock, but that seems a bit extreme
in practice.)
The reason we use two 32-bit fields on tilepro is that the only available
atomic instruction is tns (test and set), which sets a 32-bit "1" value
into the target memory and returns the old 32-bit value. So we need to
be able to safely "corrupt" the next_ticket value with a "1", load the
current_ticket value, and if they don't match, rewrite next_ticket with
its old value. We can't safely do this if next_ticket and current_ticket
are 16-bit fields in one 32-bit word, since the "tns" operation would
corrupt the current_ticket value in that case, making someone
waiting on ticket 0 think they owned the lock.
On tilegx we made the atomic instruction situation much, much better :-)
>> For 64-bit, a READ_ONCE() appears to be in order -- no obvious
>> volatility present.
>>
It depends, I guess. If you are spinning on arch_spin_is_locked(), yes,
you need to make sure to do something to ensure the value is re-read.
But arch_spin_unlock_wait() already calls delay_backoff(), which calls
relax(), which includes a barrier(), so we're OK there. But if
stylistically
the consensus calls for a READ_ONCE() in arch_spin_is_locked(), I
can certainly add that. What do folks think?
Assuming the answers to both questions is "change the code",
how does this look?
diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index c0a77b38d39a..7c7b80bd83db 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -41,8 +41,23 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
* to claim the lock is held, since it will be momentarily
* if not already. There's no need to wait for a "valid"
* lock->next_ticket to become available.
+ *
+ * We order the reads here so that if we claim the lock is
+ * unlocked, we know it actually was for at least a moment.
+ * Since current_ticket is never incremented above
+ * next_ticket, by reading current first, then next, and
+ * finding them equal, we know that during that window between
+ * the reads the lock was unlocked.
+ *
+ * There is a technical risk in this that between reading
+ * current and reading next, other cores locked and unlocked
+ * two billion times without the lock ever being unlocked, and
+ * therefore it looks like the lock was at some point unlocked
+ * but it never was. But this seems highly improbable.
*/
- return lock->next_ticket != lock->current_ticket;
+ int current = READ_ONCE(lock->current_ticket);
+ int next = READ_ONCE(lock->next_ticket);
+ return next != current;
}
void arch_spin_lock(arch_spinlock_t *lock);
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index 9a12b9c7e5d3..b9718fb4e74a 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -18,6 +18,8 @@
#ifndef _ASM_TILE_SPINLOCK_64_H
#define _ASM_TILE_SPINLOCK_64_H
+#include <linux/compiler.h>
+
/* Shifts and masks for the various fields in "lock". */
#define __ARCH_SPIN_CURRENT_SHIFT 17
#define __ARCH_SPIN_NEXT_MASK 0x7fff
@@ -44,7 +46,8 @@ static inline u32 arch_spin_next(u32 val)
/* The lock is locked if a task would have to wait to get it. */
static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
- u32 val = lock->lock;
+ /* Use READ_ONCE() to ensure that calling this in a loop is OK. */
+ u32 val = READ_ONCE(lock->lock);
return arch_spin_current(val) != arch_spin_next(val);
}
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists