[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0807131041240.2959@woody.linux-foundation.org>
Date: Sun, 13 Jul 2008 10:46:59 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Dmitry Adamushko <dmitry.adamushko@...il.com>
cc: Vegard Nossum <vegard.nossum@...il.com>,
Paul Menage <menage@...gle.com>,
Max Krasnyansky <maxk@...lcomm.com>, Paul Jackson <pj@....com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, miaox@...fujitsu.com,
rostedt@...dmis.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken
On Sun, 13 Jul 2008, Linus Torvalds wrote:
>
> The thing is, we should fix the top level code to never even _consider_ an
> invalid CPU as a target, and that in turn should mean that all the other
> code should be able to just totally ignore CPU hotplug events.
IOW, I think we should totally remove the whole "update_sched_domains()"
thing too. Any logic that needs it is broken. We shouldn't detach the
scheduler domains in DOWN_PREPARE (much less UP_PREPARE), we should just
leave them damn well alone.
As the comment says, "The domains and groups cannot be updated in place
without racing with the balancing code". The thing is, we shouldn't even
try. The correct way to handle all this is to make the balancing code use
the domains regardless, but protect against CPU's going down with
_another_ data structure that is much easier to update.
Namely something like 'cpu_active_map'.
Then we just get rid of all the crap in update_sched_domains() entirely,
and then we can make the cpusets code do the *sane* thing, which is to
rebuild the scheduler domains only when the CPU up/down has completed.
So instead of this illogical and crazy mess:
+ switch (phase) {
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ common_cpu_mem_hotplug_unplug(1);
it should just say
+ switch (phase) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ common_cpu_mem_hotplug_unplug(1);
because it only makes sense to rebuild the scheduler domains when the
thing SUCCEEDS.
See? By having a sane design, the code is not just more robust and easy to
follow, you can also simplify it and make it more logical.
The current design is not sane.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists