[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070530165557.GB1626@in.ibm.com>
Date: Wed, 30 May 2007 22:25:57 +0530
From: Srivatsa Vaddagiri <vatsa@...ibm.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Satoru Takeuchi <takeuchi_satoru@...fujitsu.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Zwane Mwaikambo <zwane@....linux.org.uk>,
Nathan Lynch <nathanl@...tin.ibm.com>,
Joel Schopp <jschopp@...tin.ibm.com>,
Ashok Raj <ashok.raj@...el.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
Gautham R Shenoy <ego@...ibm.com>, akpm@...ux-foundation.org,
Dipankar <dipankar@...ibm.com>
Subject: Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
On Tue, May 29, 2007 at 01:56:24PM -0700, Linus Torvalds wrote:
> As far as I'm concerned, we should
> - use "preempt_disable()" to protect against CPU's coming and going
> - use "stop_machine()" or similar that already honors preemption, and
> which I trust a whole lot more than freezer.
> - .. especially since this is already how we are supposed to be protected
> against CPU's going away, and we've already started doing that (for an
> example of this, see things like e18f3ffb9c from Andrew)
>
> It really does seem fairly straightforward to make "__cpu_up()" be called
> through stop_machine too. Looking at _cpu_down:
>
> mutex_lock(&cpu_bitmask_lock);
> p = __stop_machine_run(take_cpu_down, NULL, cpu);
> mutex_unlock(&cpu_bitmask_lock);
>
> and then looking at _cpu_up:
>
> mutex_lock(&cpu_bitmask_lock);
> ret = __cpu_up(cpu);
> mutex_unlock(&cpu_bitmask_lock);
>
> I just go "Aww, wouldn't it be nice to just make that "__cpu_up()" call be
> done through __stop_machine_run() too?"
>
> Hmm?
>
> Then, you could get the "cpu_bitmask_lock" if you need to sleep,
and that's where all the problems started - sleepers needing to take that mutex
recursively (which we did/do not support).
foo() takes cpu_bitmask_lock and calls
foo_bar() which also needs cpu_bitmask_lock
What is a solution to that?
- Forget (hide?) this whole locking mess by using freezer, which
is what Andrew wanted us to shoot for :) I am somewhat biased
with Andrew here in that I think it will lead to more stabler cpu
hotplug code over time. Again I know some people will beg to differ
on this view.
- extend mutexes to support recursion (which I gather Linux has
religiously avoided so far)
- invent a special lock for cpu hotplug which supports recursion.
This is what Gautham tried doing with [1], with the bonus that it
made the lock extremely scalable for readers by using per-cpu
reference counters and RCU. He is preparing to resend those patches
against latest kernel atm
- Anything else you can think of?
[1] http://lkml.org/lkml/2006/10/26/73
> but if you don't want to do that (and quite often you don't), just doing a
> "preempt_disable()" or taking a spinlock will *also* guarantee that no new
> CPU's suddenly show up, so it's safe to look at the CPU online bitmasks.
>
> Do we really need anything else?
see above
> As mentioned, it's actually fairly easy to add verification calls to make
> sure that certain accesses are done with preemption disabled, so..
--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists