linux-kernel - Re: [PATCH v2] Make sure timers have migrated before killing migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1274856235.5882.4423.camel@twins>
Date:	Wed, 26 May 2010 08:43:55 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	"Amit K. Arora" <aarora@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Gautham R Shenoy <ego@...ibm.com>,
	Darren Hart <dvhltc@...ibm.com>,
	Brian King <brking@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] Make sure timers have migrated before killing
 migration_thread

On Tue, 2010-05-25 at 22:19 +0200, Thomas Gleixner wrote:
> On Thu, 20 May 2010, Peter Zijlstra wrote:
> 
> > On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > > Alternate Solution considered : Another option considered was to
> > > increase the priority of the hrtimer cpu offline notifier, such that it
> > > gets to run before scheduler's migration cpu offline notifier. In this
> > > way we are sure that the timers will get migrated before migration_call
> > > tries to kill migration_thread. But, this can have some non-obvious
> > > implications, suggested Srivatsa.
> > 
> > 
> > > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > > the hotplug lock and thus the above becomes a race where we could
> > > > possible kill off the migration thread of a newly brought up cpu:
> > > > 
> > > >  cpu0 - down 2
> > > >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > >  cpu0 - post_down 2 - frees the migration thread -- oops!
> > > 
> > > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > > The cpuset_lock will synchronize, and thus avoid race between killing of
> > > migration_thread in up_prepare and post_dead events. 
> > > 
> > > Here is the updated patch. If you don't like this one too, do you mind
> > > suggesting an alternate approach to tackle the problem ? Thanks !
> > 
> > Right, so this isn't pretty at all..
> > 
> > Ingo, the comment near the migration_notifier says that migration_call
> > should happen before all else, but can you see anything that would break
> > if we let the timer migration happen first?
> > 
> > Thomas?
> 
> That should work, though what is killing the scheduler per rq hrtimers
> _before_ we migrate stuff ? We don't want to migrate them, right ?

They're not rq timers, they're the 'cgroup' bandwidth timers and those
are free to migrate.

What I think happens is that the timer ends up being on the cpu that
goes down, then we disable IRQs on it and run out of bandwidth and get
stuck.

Anyway, we solved it with a one-liner in a different way.

Eventually I'll rip the whole migration thread thingy out of SCHED_FIFO,
which too should solve the issue I think.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/