linux-kernel - Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.98.0704181223190.2828@woody.linux-foundation.org>
Date:	Wed, 18 Apr 2007 12:40:17 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Matt Mackall <mpm@...enic.com>, Nick Piggin <npiggin@...e.de>,
	William Lee Irwin III <wli@...omorphy.com>,
	Peter Williams <pwil3058@...pond.net.au>,
	Mike Galbraith <efault@....de>,
	Con Kolivas <kernel@...ivas.org>, ck list <ck@....kolivas.org>,
	Bill Huey <billh@...ppy.monkey.org>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair
 Scheduler [CFS]

On Wed, 18 Apr 2007, Ingo Molnar wrote:
> 
> perhaps a more fitting term would be 'precise group-scheduling'. Within 
> the lowest level task group entity (be that thread group or uid group, 
> etc.) 'precise scheduling' is equivalent to 'fairness'.

Yes. Absolutely. Except I think that at least if you're going to name 
somethign "complete" (or "perfect" or "precise"), you should also admit 
that groups can be hierarchical.

The "threads in a process" thing is a great example of a hierarchical 
group. Imagine if X was running as a collection of threads - then each 
server thread would no longer be more important than the clients! But if 
you have a mix of "bags of threads" and "single process" kind 
applications, then very arguably the single thread in a single traditional 
process should get as much time as the "bag of threads" process gets 
total.

So it really should be a hierarchical notion, where each thread is owned 
by one "process", and each process is owned by one "user", and each user 
is in one "virtual machine" - there's at least three different levels to 
this, and you'd want to schedule this thing top-down: virtual machines 
should be given CPU time "fairly" (which doesn't need to mean "equally", 
of course - nice-values could very well work at that level too), and then 
within each virtual machine users or "scheduling groups" should be 
scheduled fairly, and then within each scheduling group the processes 
should be scheduled, and within each process threads should equally get 
their fair share at _that_ level.

And no, I don't think we necessarily need to do something quite that 
elaborate. But I think that's the kind of "obviously good goal" to keep in 
mind. Can we perhaps _approximate_ something like that by other means? 

For example, maybe we can approximate it by spreading out the statistics: 
right now you have things like

 - last_ran, wait_runtime, sum_wait_runtime..

be per-thread things. Maybe some of those can be spread out, so that you 
put a part of them in the "struct vm_struct" thing (to approximate 
processes), part of them in the "struct user" struct (to approximate the 
user-level thing), and part of it in a per-container thing for when/if we 
support that kind of thing?

IOW, I don't think the scheduling "groups" have to be explicit boxes or 
anything like that. I suspect you can make do with just heurstics that 
penalize the same "struct user" and "struct vm_struct" to get overly much 
scheduling time, and you'll get the same _effect_. 

And I don't think it's wrong to look at the "one hundred processes by the 
same user" case as being an important case. But it should not be the 
*only* case or even necessarily the *main* case that matters. I think a 
benchmark that literally does

	pid_t pid = fork();
	if (pid < 0)
		exit(1);
	if (pid) {
		if (setuid(500) < 0)
			exit(2);
		for (;;)
			/* Do nothing */;
	}
	if (setuid(501) < 0)
		exit(3);
	fork();
	for (;;)
		/* Do nothing in two processes */;

and I think that it's a really valid benchmark: if the scheduler gives 25% 
of time to each of the two processes of user 501, and 50% to user 500, 
then THAT is a good scheduler.

If somebody wants to actually write and test the above as a test-script, 
and add it to a collection of scheduler tests, I think that could be a 
good thing.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/