[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070418130813.GA31925@holomorphy.com>
Date: Wed, 18 Apr 2007 06:08:13 -0700
From: William Lee Irwin III <wli@...omorphy.com>
To: Matt Mackall <mpm@...enic.com>
Cc: Nick Piggin <npiggin@...e.de>,
Peter Williams <pwil3058@...pond.net.au>,
Mike Galbraith <efault@....de>,
Con Kolivas <kernel@...ivas.org>, Ingo Molnar <mingo@...e.hu>,
ck list <ck@....kolivas.org>,
Bill Huey <billh@...ppy.monkey.org>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, Apr 18, 2007 at 12:55:25AM -0500, Matt Mackall wrote:
> Why are processes special? Should user A be able to get more CPU time
> for his job than user B by splitting it into N parallel jobs? Should
> we be fair per process, per user, per thread group, per session, per
> controlling terminal? Some weighted combination of the preceding?[2]
On a side note, I think a combination of all of the above is a very
good idea, plus process groups (pgrp's). All the make -j loads should
come up in one pgrp of one session for one user and hence should be
automatically kept isolated in its own corner by such policies. Thread
bombs, forkbombs, and so on get handled too, which is good when on e.g.
a compileserver and someone rudely spawns too many tasks.
Thinking of the scheduler as a CPU bandwidth allocator, this means
handing out shares of CPU bandwidth to all users on the system, which
in turn hand out shares of bandwidth to all sessions, which in turn
hand out shares of bandwidth to all process groups, which in turn hand
out shares of bandwidth to all thread groups, which in turn hand out
shares of bandwidth to threads. The event handlers for the scheduler
need not deal with this apart from task creation and exit and various
sorts of process ID changes (e.g. setsid(), setpgrp(), setuid(), etc.).
They just determine what the scheduler sees as ->load_weight or some
analogue of ->static_prio, though it is possible to do this by means of
data structure organization instead of numerical prioritization. It'd
probably have to be calculated on the fly by, say, doing fixpoint
arithmetic something like
user_share(p)*session_share(p)*pgrp_share(p)*tgrp_share(p)*task_share(p)
so that readjusting the shares of aggregates doesn't have to traverse
lists and remains O(1). Each of the share computations can instead just
do some analogue of the calculation p->load_weight/rq->raw_weighted_load
in fixpoint, though precision issues with this make me queasy. There is
maybe a slight nasty point in that the ->raw_weighted_load analogue for
users or whatever the highest level chosen is ends up being global. One
might as well get users in there and omit intermediate levels if any are
to be omitted so that the truly global state is as read-only as possible.
I suppose jacking up the fixpoint precision to 128-bit or 256-bit all
below the radix point (our max is 1.0 after all) until precision issues
vanish can be done but the idea of that much number crunching in the
scheduler makes me rather uncomfortable. I hope u64 or u32 [2] can be
gotten away with as far as fixpoint goes.
-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists