[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.98.0704181223190.2828@woody.linux-foundation.org>
Date: Wed, 18 Apr 2007 12:40:17 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ingo Molnar <mingo@...e.hu>
cc: Matt Mackall <mpm@...enic.com>, Nick Piggin <npiggin@...e.de>,
William Lee Irwin III <wli@...omorphy.com>,
Peter Williams <pwil3058@...pond.net.au>,
Mike Galbraith <efault@....de>,
Con Kolivas <kernel@...ivas.org>, ck list <ck@....kolivas.org>,
Bill Huey <billh@...ppy.monkey.org>,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair
Scheduler [CFS]
On Wed, 18 Apr 2007, Ingo Molnar wrote:
>
> perhaps a more fitting term would be 'precise group-scheduling'. Within
> the lowest level task group entity (be that thread group or uid group,
> etc.) 'precise scheduling' is equivalent to 'fairness'.
Yes. Absolutely. Except I think that at least if you're going to name
somethign "complete" (or "perfect" or "precise"), you should also admit
that groups can be hierarchical.
The "threads in a process" thing is a great example of a hierarchical
group. Imagine if X was running as a collection of threads - then each
server thread would no longer be more important than the clients! But if
you have a mix of "bags of threads" and "single process" kind
applications, then very arguably the single thread in a single traditional
process should get as much time as the "bag of threads" process gets
total.
So it really should be a hierarchical notion, where each thread is owned
by one "process", and each process is owned by one "user", and each user
is in one "virtual machine" - there's at least three different levels to
this, and you'd want to schedule this thing top-down: virtual machines
should be given CPU time "fairly" (which doesn't need to mean "equally",
of course - nice-values could very well work at that level too), and then
within each virtual machine users or "scheduling groups" should be
scheduled fairly, and then within each scheduling group the processes
should be scheduled, and within each process threads should equally get
their fair share at _that_ level.
And no, I don't think we necessarily need to do something quite that
elaborate. But I think that's the kind of "obviously good goal" to keep in
mind. Can we perhaps _approximate_ something like that by other means?
For example, maybe we can approximate it by spreading out the statistics:
right now you have things like
- last_ran, wait_runtime, sum_wait_runtime..
be per-thread things. Maybe some of those can be spread out, so that you
put a part of them in the "struct vm_struct" thing (to approximate
processes), part of them in the "struct user" struct (to approximate the
user-level thing), and part of it in a per-container thing for when/if we
support that kind of thing?
IOW, I don't think the scheduling "groups" have to be explicit boxes or
anything like that. I suspect you can make do with just heurstics that
penalize the same "struct user" and "struct vm_struct" to get overly much
scheduling time, and you'll get the same _effect_.
And I don't think it's wrong to look at the "one hundred processes by the
same user" case as being an important case. But it should not be the
*only* case or even necessarily the *main* case that matters. I think a
benchmark that literally does
pid_t pid = fork();
if (pid < 0)
exit(1);
if (pid) {
if (setuid(500) < 0)
exit(2);
for (;;)
/* Do nothing */;
}
if (setuid(501) < 0)
exit(3);
fork();
for (;;)
/* Do nothing in two processes */;
and I think that it's a really valid benchmark: if the scheduler gives 25%
of time to each of the two processes of user 501, and 50% to user 500,
then THAT is a good scheduler.
If somebody wants to actually write and test the above as a test-script,
and add it to a collection of scheduler tests, I think that could be a
good thing.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists