[<prev] [next>] [day] [month] [year] [list]
Message-Id: <200704190811.17290.philipp.marek@bmlv.gv.at>
Date: Thu, 19 Apr 2007 08:11:16 +0200
From: "Ph. Marek" <philipp.marek@...v.gv.at>
To: linux-kernel@...r.kernel.org
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair
Pine.LNX.4.64.0704181515290.25880 () alien ! or ! mcafeemobile ! com
Davide Libenzi wrote:
> On Wed, 18 Apr 2007, Ingo Molnar wrote:
> > That's one reason why i dont think it's necessarily a good idea to
> > group-schedule threads, we dont really want to do a per thread group
> > percpu_alloc().
>
> I still do not have clear how much overhead this will bring into the
> table, but I think (like Linus was pointing out) the hierarchy should look
> like:
...
> The "run_queue" concept (and data) that now is bound to a CPU, need to be
> replicated in:
>
> ROOT <- VCPUs add themselves here
> VCPU <- USERs add themselves here
> USER <- PROCs add themselves here
> PROC <- THREADs add themselves here
> THREAD (ultimate fine grained scheduling unit)
>
> So ROOT, VCPU, USER and PROC will have their own "run_queue".
...
I can't comment on the internals about run_queues, overhead and so on, but
these discussion leads me to the idea about a dynamic *tree* of scheduler
queues.
With dynamic I mean that they are configured in user-space - be it with
something like CLONE_NEW_SCHEDULER_CLASS, or possibly better some other
interface to allow an *arbitrary* tree that is not coupled on the
user/process/thread borders. New threads and processes are per default
created in the parents queue, just like now.
So user-space could build an tree like this (eg with a pam module):
Default queue - init
+- kernel-thread queue (to avoid having kernel threads being blocked by
| user-space)
+- cron, atd, sshd, .... unless they change their "class"
+- user1
| +- X
| +- kde
| | + konsole
| | \ kmail
| | + mail fetch thread
| | + mail filter thread
| | + GUI thread
| | \- mplayer
\- user2
+.....
Whether the queues are handled with some staircase behaviour, or CFS, or just
get CPU time distributed by nice level, is another question - but they have
to be "fair" only locally.
Of course, that's simply some sort of moving the problem into user-space - but
I think (and read that often enough) that the needs vary so much that a
single, hardcoded system won't suffice. And we can try to get the "right"
behaviour in each queue, just like now.
Walking the tree might make the scheduler not fully O(1) - but per default
only one queue is defined (or possibly two queues, one for kernel threads),
and everything else can be done by user-space.
The mentioned case of a web-server with gzip started would be done with having
each httpd being in a queue just below init, and having everything else in
another - or by nicing the webserver, as it's defined as "important".
(I believe that's called "moving policy into userspace" :-)
Regards,
Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists