linux-kernel - Re: [PATCH/RFC] timer: fix deadlock on cpu hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1285083618.2275.884.camel@laptop>
Date:	Tue, 21 Sep 2010 17:40:18 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	Heiko Carstens <heiko.carstens@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH/RFC] timer: fix deadlock on cpu hotplug

On Tue, 2010-09-21 at 17:36 +0200, Tejun Heo wrote:
> Hello,
> 
> On 09/21/2010 04:20 PM, Heiko Carstens wrote:
> > For some reason the scheduler decided to throttle RT tasks on the runqueue
> > of cpu 5 (rt_throttled = 1). So as long as rt_throttled == 1 we won't see the
> > migration thread coming back to execution.
> > The only thing that would unthrottle the runqueue would be the rt_period_timer.
> > The timer is indeed scheduled, however in the dump I have it has been expired
> > for more than four hours.
> > The reason is simply that the timer is pending on the offlined cpu 0 and
> > therefore would never fire before it gets migrated to an online cpu. Before
> > the cpu hotplug mechanisms (cpu hotplug notifier with state CPU_DEAD) would
> > migrate the timer to an online cpu stop_machine() must complete ---> deadlock.
> > 
> > The fix _seems_ to be simple: just migrate timers after __cpu_disable() has
> > been called and use the CPU_DYING state. The subtle difference is of course
> > that the migration code now gets executed on the cpu that actually just is
> > going to disable itself instead of an arbitrary cpu that stays online.
> 
> I think this is the second time we're seeing deadlock during cpu down
> due to RT throttling and timer problem.  The rather delicate
> dependency there makes me somewhat nervous.  If possible, I think it
> would be better if we can simply turn the RT throttling off when
> cpu_stop kicks in.  It's intended to be a mechanism to monopolize all
> CPU cycles to begin with.  Would that be difficult?

I've wanted to pull the whole migration thread out from SCHED_FIFO for a
while. Doing that is probably the easiest thing.

Still would be nice to also cure this problem differently.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/