[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58406458.7060606@caviumnetworks.com>
Date: Thu, 1 Dec 2016 09:56:40 -0800
From: David Daney <ddaney@...iumnetworks.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: LKML <linux-kernel@...r.kernel.org>,
David Daney <ddaney@...iumnetworks.com>,
Peter Zijlstra <peterz@...radead.org>,
"Ingo Molnar" <mingo@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
"Sebastian Siewior" <bigeasy@...utronix.de>,
Will Deacon <will.deacon@....com>,
"Mark Rutland" <mark.rutland@....com>, <stable@...r.kernel.org>
Subject: Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
On 11/30/2016 01:04 PM, Thomas Gleixner wrote:
> David reported a futex/rtmutex state corruption. It's caused by the
> following problem:
>
> CPU0 CPU1 CPU2
>
> l->owner=T1
> rt_mutex_lock(l)
> lock(l->wait_lock)
> l->owner = T1 | HAS_WAITERS;
> enqueue(T2)
> boost()
> unlock(l->wait_lock)
> schedule()
>
> rt_mutex_lock(l)
> lock(l->wait_lock)
> l->owner = T1 | HAS_WAITERS;
> enqueue(T3)
> boost()
> unlock(l->wait_lock)
> schedule()
> signal(->T2) signal(->T3)
> lock(l->wait_lock)
> dequeue(T2)
> deboost()
> unlock(l->wait_lock)
> lock(l->wait_lock)
> dequeue(T3)
> ===> wait list is now empty
> deboost()
> unlock(l->wait_lock)
> lock(l->wait_lock)
> fixup_rt_mutex_waiters()
> if (wait_list_empty(l)) {
> owner = l->owner & ~HAS_WAITERS;
> l->owner = owner
> ==> l->owner = T1
> }
>
> lock(l->wait_lock)
> rt_mutex_unlock(l) fixup_rt_mutex_waiters()
> if (wait_list_empty(l)) {
> owner = l->owner & ~HAS_WAITERS;
> cmpxchg(l->owner, T1, NULL)
> ===> Success (l->owner = NULL)
> l->owner = owner
> ==> l->owner = T1
> }
>
> That means the problem is caused by fixup_rt_mutex_waiters() which does the
> RMW to clear the waiters bit unconditionally when there are no waiters in
> the rtmutexes rbtree.
>
> This can be fatal: A concurrent unlock can release the rtmutex in the
> fastpath because the waiters bit is not set. If the cmpxchg() gets in the
> middle of the RMW operation then the previous owner, which just unlocked
> the rtmutex is set as the owner again when the write takes place after the
> successfull cmpxchg().
>
> The solution is rather trivial: Verify that the owner member of the rtmutex
> has the waiters bit set before clearing it. This does not require a
> cmpxchg() or other atomic operations because the waiters bit can only be
> set and cleared with the rtmutex wait_lock held. It's also safe against the
> fast path unlock attempt. The unlock attempt via cmpxchg() will either see
> the bit set and take the slowpath or see the bit cleared and release it
> atomically in the fastpath.
>
> It's remarkable that the test program provided by David triggers on ARM64
> and MIPS64 really quick, but it refuses to reproduce on x8664, while the
> problem exists there as well. That refusal might explain that this got not
> discovered earlier despite the bug existing from day one of the rtmutex
> implementation more than 10 years ago.
>
> Thanks to David for meticulously instrumenting the code and providing the
> information which allowed to decode this subtle problem.
>
> Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
> Reported-by: David Daney<ddaney@...iumnetworks.com>
> Signed-off-by: Thomas Gleixner<tglx@...utronix.de>
> Cc:stable@...r.kernel.org
FWIW:
Tested-by: David Daney <david.daney@...ium.com>
... on arm64 and mips64 where it fixes the failures we were seeing.
Thanks to Thomas for taking the time to work through this thing.
David Daney
> ---
> kernel/locking/rtmutex.c | 68 +++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 66 insertions(+), 2 deletions(-)
>
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiter
>
> static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
> {
> - if (!rt_mutex_has_waiters(lock))
> - clear_rt_mutex_waiters(lock);
> + unsigned long owner, *p = (unsigned long *) &lock->owner;
> +
> + if (rt_mutex_has_waiters(lock))
> + return;
> +
> + /*
> + * The rbtree has no waiters enqueued, now make sure that the
> + * lock->owner still has the waiters bit set, otherwise the
> + * following can happen:
> + *
> + * CPU 0 CPU 1 CPU2
> + * l->owner=T1
> + * rt_mutex_lock(l)
> + * lock(l->lock)
> + * l->owner = T1 | HAS_WAITERS;
> + * enqueue(T2)
> + * boost()
> + * unlock(l->lock)
> + * block()
> + *
> + * rt_mutex_lock(l)
> + * lock(l->lock)
> + * l->owner = T1 | HAS_WAITERS;
> + * enqueue(T3)
> + * boost()
> + * unlock(l->lock)
> + * block()
> + * signal(->T2) signal(->T3)
> + * lock(l->lock)
> + * dequeue(T2)
> + * deboost()
> + * unlock(l->lock)
> + * lock(l->lock)
> + * dequeue(T3)
> + * ==> wait list is empty
> + * deboost()
> + * unlock(l->lock)
> + * lock(l->lock)
> + * fixup_rt_mutex_waiters()
> + * if (wait_list_empty(l) {
> + * l->owner = owner
> + * owner = l->owner & ~HAS_WAITERS;
> + * ==> l->owner = T1
> + * }
> + * lock(l->lock)
> + * rt_mutex_unlock(l) fixup_rt_mutex_waiters()
> + * if (wait_list_empty(l) {
> + * owner = l->owner & ~HAS_WAITERS;
> + * cmpxchg(l->owner, T1, NULL)
> + * ===> Success (l->owner = NULL)
> + *
> + * l->owner = owner
> + * ==> l->owner = T1
> + * }
> + *
> + * With the check for the waiter bit in place T3 on CPU2 will not
> + * overwrite. All tasks fiddling with the waiters bit are
> + * serialized by l->lock, so nothing else can modify the waiters
> + * bit. If the bit is set then nothing can change l->owner either
> + * so the simple RMW is safe. The cmpxchg() will simply fail if it
> + * happens in the middle of the RMW because the waiters bit is
> + * still set.
> + */
> + owner = READ_ONCE(*p);
> + if (owner & RT_MUTEX_HAS_WAITERS)
> + WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
> }
>
> /*
>
>
Powered by blists - more mailing lists