linux-kernel - Re: current linux-2.6.git: cpusets completely broken

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 13 Jul 2008 10:46:59 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dmitry Adamushko <dmitry.adamushko@...il.com>
cc:	Vegard Nossum <vegard.nossum@...il.com>,
	Paul Menage <menage@...gle.com>,
	Max Krasnyansky <maxk@...lcomm.com>, Paul Jackson <pj@....com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, miaox@...fujitsu.com,
	rostedt@...dmis.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken

On Sun, 13 Jul 2008, Linus Torvalds wrote:
> 
> The thing is, we should fix the top level code to never even _consider_ an 
> invalid CPU as a target, and that in turn should mean that all the other 
> code should be able to just totally ignore CPU hotplug events.

IOW, I think we should totally remove the whole "update_sched_domains()" 
thing too. Any logic that needs it is broken. We shouldn't detach the 
scheduler domains in DOWN_PREPARE (much less UP_PREPARE), we should just 
leave them damn well alone.

As the comment says, "The domains and groups cannot be updated in place 
without racing with the balancing code". The thing is, we shouldn't even 
try. The correct way to handle all this is to make the balancing code use 
the domains regardless, but protect against CPU's going down with 
_another_ data structure that is much easier to update.

Namely something like 'cpu_active_map'.

Then we just get rid of all the crap in update_sched_domains() entirely, 
and then we can make the cpusets code do the *sane* thing, which is to 
rebuild the scheduler domains only when the CPU up/down has completed.

So instead of this illogical and crazy mess:

	+       switch (phase) {
	+       case CPU_UP_CANCELED:
	+       case CPU_UP_CANCELED_FROZEN:
	+       case CPU_DOWN_FAILED:
	+       case CPU_DOWN_FAILED_FROZEN:
	+       case CPU_ONLINE:
	+       case CPU_ONLINE_FROZEN:
	+       case CPU_DEAD:
	+       case CPU_DEAD_FROZEN:
	+               common_cpu_mem_hotplug_unplug(1);

it should just say

	+       switch (phase) {
	+       case CPU_ONLINE:
	+       case CPU_ONLINE_FROZEN:
	+       case CPU_DEAD:
	+       case CPU_DEAD_FROZEN:
	+               common_cpu_mem_hotplug_unplug(1);

because it only makes sense to rebuild the scheduler domains when the 
thing SUCCEEDS. 

See? By having a sane design, the code is not just more robust and easy to 
follow, you can also simplify it and make it more logical.

The current design is not sane.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/