[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20061019171439.af1e6a07.pj@sgi.com>
Date: Thu, 19 Oct 2006 17:14:39 -0700
From: Paul Jackson <pj@....com>
To: Martin Bligh <mbligh@...gle.com>
Cc: nickpiggin@...oo.com.au, akpm@...l.org, menage@...gle.com,
Simon.Derr@...l.net, linux-kernel@...r.kernel.org, dino@...ibm.com,
rohitseth@...gle.com, holt@....com, dipankar@...ibm.com,
suresh.b.siddha@...el.com, clameter@....com
Subject: Re: [RFC] cpuset: remove sched domain hooks from cpusets
Martin wrote:
> We (Google) are planning to use it to do some partitioning, albeit on
> much smaller machines. I'd really like to NOT use cpus_allowed from
> previous experience - if we can get it to to partition using separated
> sched domains, that would be much better.
Are you saying that you wished that cpusets was not implemented using
cpus_allowed, but -instead- implemented using sched domain partitioning?
Well, as you likely can guess by now, that's unlikely.
Cpusets provides hierarchically nested sets of CPU and Memory Nodes,
especially useful for managing nested allocation of processor and
memory resources on large systems. The essential mechanism at the core
of cpusets is manipulating the cpus_allowed and mems_allowed masks in
each task.
Cpusets have also been dabbling in the business of driving the sched
domain partitioning, but I am getting more inclined as time goes on to
think that was a mistake.
> From my dim recollections of previous discussions when cpusets was
> added in the first place, we asked for exactly the same thing then.
What are you asking for again? ;).
Are you asking for a decent interface to sched domain partitioning?
Perhaps cpusets are not the best way to get that.
I hear tell from my colleague Christoph Lameter that he is considering
trying to make some improvements, that would benefit us all, to the
sched domain partitioning code - smaller, faster, simpler, better and
all that good stuff. Perhaps you guys at Google should join in that
effort, and see to it that your needs are met as well. I would
recommend providing whatever kernel-user API's you need for this, if
any, separately from cpusets.
So far, the requirements that I am aware of on such an effort:
1) Somehow support isolated CPUs (no load balancing to or from them).
For example, at least one real-time project needs these.
2) Whatever you were talking about above that Google is planning, some
sort of partitioning.
3) Somehow, whether by magic or by implicit or explicit partitioning
of the systems CPUs, ensure that its load balancing scales to cover
my employers (SGI) big CPU count systems.
4) Hopefully smaller, less #ifdef'y and easier to understand than the
current code.
5) Avoid poor fit interactions with cpusets, which have a different
shape (naturally hierarchical), internal mechanism (allowed bitmasks
rather than scheduler balancing domains), scope (combined processor
plus memory) and natural API style (a full fledged file system to
name these sets, rather than a few bitmasks and flags.)
Good luck.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@....com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists