[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C39DED6.10502@us.ibm.com>
Date: Sun, 11 Jul 2010 08:10:14 -0700
From: Darren Hart <dvhltc@...ibm.com>
To: Mike Galbraith <mgalbraith@...e.de>
CC: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...e.hu>,
Eric Dumazet <eric.dumazet@...il.com>,
John Kacur <jkacur@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
linux-rt-users@...r.kernel.org
Subject: Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t
On 07/11/2010 06:33 AM, Mike Galbraith wrote:
> On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote:
>> On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote:
>
>>> If we can't move the unlock above before set_owner, then we may need a:
>>>
>>> retry:
>>> cur->lock()
>>> top_waiter = get_top_waiter()
>>> cur->unlock()
>>>
>>> double_lock(cur, topwaiter)
>>> if top_waiter != get_top_waiter()
>>> double_unlock(cur, topwaiter)
>>> goto retry
>>>
>>> Not ideal, but I think I prefer that to making all the hb locks raw.
>
> Another option: only scratch the itchy spot.
>
> futex: non-blocking synchronization point for futex_wait_requeue_pi() and futex_requeue().
>
> Problem analysis by Darren Hart;
> The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates
> a scenario where a task can wake-up, not knowing it has been enqueued on an
> rtmutex. In order to detect this, the task would have to be able to take either
> task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately,
> without already holding one of these, the pi_blocked_on variable can change
> from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed
> to take a sleeping lock after wakeup or it could end up trying to block on two
> locks, the second overwriting a valid pi_blocked_on value. This obviously
> breaks the pi mechanism.
>
> Rather than convert the bh-lock to a raw spinlock, do so only in the spot where
> blocking cannot be allowed, ie before we know that lock handoff has completed.
I like it. I especially like the change is only evident if you are using
the code path that introduced the problem in the first place. If you're
doing a lot of requeue_pi operations, then the waking waiters have an
advantage over new pending waiters or other tasks with futex keyed on
the same hash-bucket... but that seems acceptable to me.
I'd like to confirm that holding the pendowner->pi-lock across the
wakeup in wakeup_next_waiter() isn't feasible first. If it can work, I
think the impact would be lower. I'll have a look tomorrow.
Nice work Mike.
--
Darrem
> Signed-off-by: Mike Galbraith<efault@....de>
> Cc: Darren Hart<dvhltc@...ibm.com>
> Cc: Thomas Gleixner<tglx@...utronix.de>
> Cc: Peter Zijlstra<peterz@...radead.org>
> Cc: Ingo Molnar<mingo@...e.hu>
> Cc: Eric Dumazet<eric.dumazet@...il.com>
> Cc: John Kacur<jkacur@...hat.com>
> Cc: Steven Rostedt<rostedt@...dmis.org>
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index a6cec32..ef489f3 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
> /* Queue the futex_q, drop the hb lock, wait for wakeup. */
> futex_wait_queue_me(hb,&q, to);
>
> - spin_lock(&hb->lock);
> + /*
> + * Non-blocking synchronization point with futex_requeue().
> + *
> + * We dare not block here because this will alter PI state, possibly
> + * before our waker finishes modifying same in wakeup_next_waiter().
> + */
> + while(!spin_trylock(&hb->lock))
> + cpu_relax();
> ret = handle_early_requeue_pi_wakeup(hb,&q,&key2, to);
> spin_unlock(&hb->lock);
> if (ret)
>
>
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists