[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060730043605.GA2894@oleg>
Date: Sun, 30 Jul 2006 08:36:05 +0400
From: Oleg Nesterov <oleg@...sign.ru>
To: Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Esben Nielsen <nielsen.esben@...glemail.com>
Cc: linux-kernel@...r.kernel.org
Subject: rt_mutex_timed_lock() vs hrtimer_wakeup() race ?
I am trying to get some understanding of rt_mutex, but I'm afraid it's
not possible without your help...
// runs on CPU 0
rt_mutex_slowlock:
// main loop
for (;;) {
...
if (timeout && !timeout->task) {
ret = -ETIMEDOUT;
break;
}
...
schedule();
...
set_current_state(state);
}
What if timeout->timer is fired on CPU 1 right before set_current_state() ?
hrtimer_wakeup() does:
timeout->task = NULL; <----- [1]
spin_lock(runqueues->lock);
task->state = TASK_RUNNING; <----- [2]
(task->array != NULL, so try_to_wake_up() just goes to out_running)
If my understanding correct, [1] may slip into the critical section (because
spin_lock() is not a wmb), so that CPU 0 will see [2] but not [1]. In that
case rt_mutex_slowlock() can miss the timeout (set_current_state()->mb()
doesn't help).
Of course, this race (even _if_ I am right) is pure theoretical, but probably
we need smp_wmb() after [1] in hrtimer_wakeup().
Note that do_nanosleep() is ok, hrtimer_base->lock provides a necessary
serialization. In fact, I think it can use __set_current_state(), because
both hrtimer_start() and run_hrtimer_queue() do lock/unlock of base->lock.
Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks
owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS
bit is set, we are holding ->wait_lock, so the 'owner' can't go away until
we drop ->wait_lock. I think we can drop owner->pi_lock right after
__rt_mutex_adjust_prio(owner), we can't miss owner->pi_blocked_on != NULL
if it was true before we take owner->pi_lock, and this is the case we should
worry about, yes?
In other words (because I myself can't parse the paragraph above :), could
you explain me why this patch is not correct:
--- rtmutex.c~ 2006-07-30 05:15:38.000000000 +0400
+++ rtmutex.c 2006-07-30 05:41:44.000000000 +0400
@@ -407,7 +407,7 @@ static int task_blocks_on_rt_mutex(struc
struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex_waiter *top_waiter = waiter;
unsigned long flags;
- int boost = 0, res;
+ int res;
spin_lock_irqsave(¤t->pi_lock, flags);
__rt_mutex_adjust_prio(current);
@@ -431,24 +431,20 @@ static int task_blocks_on_rt_mutex(struc
plist_add(&waiter->pi_list_entry, &owner->pi_waiters);
__rt_mutex_adjust_prio(owner);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
spin_unlock_irqrestore(&owner->pi_lock, flags);
+
+ if (owner->pi_blocked_on)
+ goto boost;
}
else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) {
- spin_lock_irqsave(&owner->pi_lock, flags);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
- spin_unlock_irqrestore(&owner->pi_lock, flags);
+ if (owner->pi_blocked_on)
+ goto boost;
}
- if (!boost)
- return 0;
+
+ return 0;
+boost:
+ /* gets dropped in rt_mutex_adjust_prio_chain()! */
+ get_task_struct(owner);
spin_unlock(&lock->wait_lock);
----------------------------------------------------------
The same question for remove_waiter()/rt_mutex_adjust_pi().
The last (stupid) one,
wake_up_new_task:
if (unlikely(!current->array))
__activate_task(p, rq);
(offtopic) Is it really possible to have current->array == NULL here?
else {
p->prio = current->prio;
What if current was pi-boosted so that rt_prio(current->prio) == 1,
who will de-boost the child?
p->normal_prio = current->normal_prio;
Why? p->normal_prio was calculated by effective_prio() above, could you
explain why that value is not ok?
Thanks,
Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists