linux-kernel - Re: current linux-2.6.git: cpusets completely broken

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0807130943530.2959@woody.linux-foundation.org>
Date:	Sun, 13 Jul 2008 10:10:58 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dmitry Adamushko <dmitry.adamushko@...il.com>
cc:	Vegard Nossum <vegard.nossum@...il.com>,
	Paul Menage <menage@...gle.com>,
	Max Krasnyansky <maxk@...lcomm.com>, Paul Jackson <pj@....com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, miaox@...fujitsu.com,
	rostedt@...dmis.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken

On Sun, 13 Jul 2008, Dmitry Adamushko wrote:

> And let me explain one last time why I opposed your 'cpu_active_map' approach.

And let me explain why you are totally off base.

> I do agree that there are likely ways to optimize the hotplug
> machinery [ .. deleted rambling .. ]

This has *NOTHING* to do with optimizing any hotplug machinery.

> The current way to synchronize with the load-balancer is to attach
> NULL domains [ .. deleted more ramblings .. ]

This has *NOTHING* to do even with cpusets and scheduler domains!

Until you can understand that, all your arguments are total and utter 
CRAP.

So Dmitry - please follow along, and think this through.

This is a *fundamental* scheduler issue. It has nothing what-so-ever to do 
with optimization, and it has nothing to do with cpusets. It's about the 
fact that we migrate threads from one CPU to another - and we do that 
whether cpusets are even enabled or not!

And anything that uses "cpu_active_map" to decide if the migration target 
is alive is simply _buggy_.

See? Not "un-optimized". Not "cpusets". Just pure scheduling and hotplug 
issues with taking a CPU down.

As long as you continue to only look at wake_idle() and scheduler domains, 
you are missing all the *other* cases of migration. Like the one we do at 
execve() time, or in balance_task.

The thing is, we should fix the top level code to never even _consider_ an 
invalid CPU as a target, and that in turn should mean that all the other 
code should be able to just totally ignore CPU hotplug events.

In other words, it vey fundamentally SHOULD NOT MATTER that somebody 
happened to call "try_to_wake_up()" during the cpu unplug sequence. We 
should fix the fundamental scheduler routines to simply make it impossible 
for that to ever balance something back to a CPU that is going down.

And we shouldn't _care_ about what crazy things the cpusets code does.

See?

THAT is the reason for my patch. I think the cpusets callbacks are totally 
insane, but I don't care. What I care about is that the scheduler got 
confused just because those insane callbacks happened to make timing be 
just subtle enough that (and I quote):

  "try_to_wake_up() is called for one of these tasks from another CPU ->
   the load-balancer (wake_idle()) picks up a "dead" CPU and places the 
   task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later 
   -> oops."

IOW, we should never have had code that was that fragile in the first 
place! It's totally INSANE to depend on complex and fragile code, when 
we'd be much better off with simple code that always says: "I will not 
migrate a task to a CPU that is going down".

Depending on complex (and conditional) scheduler domains data structures 
is a *bug*. It's fragile, and it's a horrible design mistake.

			Linus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/