[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C98D0EB.30002@kernel.org>
Date: Tue, 21 Sep 2010 17:36:11 +0200
From: Tejun Heo <tj@...nel.org>
To: Heiko Carstens <heiko.carstens@...ibm.com>
CC: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
Rusty Russell <rusty@...tcorp.com.au>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH/RFC] timer: fix deadlock on cpu hotplug
Hello,
On 09/21/2010 04:20 PM, Heiko Carstens wrote:
> For some reason the scheduler decided to throttle RT tasks on the runqueue
> of cpu 5 (rt_throttled = 1). So as long as rt_throttled == 1 we won't see the
> migration thread coming back to execution.
> The only thing that would unthrottle the runqueue would be the rt_period_timer.
> The timer is indeed scheduled, however in the dump I have it has been expired
> for more than four hours.
> The reason is simply that the timer is pending on the offlined cpu 0 and
> therefore would never fire before it gets migrated to an online cpu. Before
> the cpu hotplug mechanisms (cpu hotplug notifier with state CPU_DEAD) would
> migrate the timer to an online cpu stop_machine() must complete ---> deadlock.
>
> The fix _seems_ to be simple: just migrate timers after __cpu_disable() has
> been called and use the CPU_DYING state. The subtle difference is of course
> that the migration code now gets executed on the cpu that actually just is
> going to disable itself instead of an arbitrary cpu that stays online.
I think this is the second time we're seeing deadlock during cpu down
due to RT throttling and timer problem. The rather delicate
dependency there makes me somewhat nervous. If possible, I think it
would be better if we can simply turn the RT throttling off when
cpu_stop kicks in. It's intended to be a mechanism to monopolize all
CPU cycles to begin with. Would that be difficult?
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists