[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <479F4631.BA47.005A.0@novell.com>
Date: Tue, 29 Jan 2008 13:28:49 -0700
From: "Gregory Haskins" <ghaskins@...ell.com>
To: "Paul Jackson" <pj@....com>
Cc: <a.p.zijlstra@...llo.nl>, <mingo@...e.hu>,
<dmitry.adamushko@...il.com>, <rostedt@...dmis.org>,
<menage@...gle.com>, <rientjes@...gle.com>, <tong.n.li@...el.com>,
<tglx@...utronix.de>, <akpm@...ux-foundation.org>,
<dhaval@...ux.vnet.ibm.com>, <vatsa@...ux.vnet.ibm.com>,
<sgrubb@...hat.com>, <linux-kernel@...r.kernel.org>,
<ebiederm@...ssion.com>, <nickpiggin@...oo.com.au>
Subject: Re: scheduler scalability - cgroups, cpusets and
load-balancing
>>> On Tue, Jan 29, 2008 at 2:37 PM, in message
<20080129133700.7f1ab444.pj@....com>, Paul Jackson <pj@....com> wrote:
> Gregory wrote:
>> > 1) What are 'per-domain' variables?
>>
>> s/per-domain/per-root-domain
>
> Oh dear - now I've got more questions, not fewer.
>
> 1) "variables" ... what variables?
Well, anything that is declared in "struct root_domain" in kernel/sched.c.
For instance, today in mainline we have:
struct root_domain {
atomic_t refcount;
cpumask_t span;
cpumask_t online;
/*
* The "RT overload" flag: it gets set if a CPU has more than
* one runnable RT task.
*/
cpumask_t rto_mask;
atomic_t rto_count;
};
The first three are just related to general root-domain infrastructure code. The last two in this case are related specifically to the rt-overload feature. In earlier versions of rt-balance, the rt-overload bitmap was a global variable. By moving it into the root_domain structure, there is now an instance per (um, for lack of a better, more up to date word) "exclusive" cpuset. That way, disparate cpusets will not bother each other with overload notifications, etc.
Note that in -rt, we have more variables in this structure (RQ priority info) but that patch hasnt been pulled into sched-devel/linux-2.6 yet.
>
> 2) Is a 'root-domain' just the RT specific portion
> of a sched_domain, or is it something else?
Its meant to be general, but the only current client is the RT sched_class. Reading back through the links you guys have been sending, its very similar in concept to the "rt-domain" stuff that you, Peter, and Steven were discussing a while back.
When I was originally putting this stuff together, I wanted to piggy back this data in the sched-domain code. But I soon realized that the sched-domain trees are per-cpu structures. What I needed was an "umbrella" structure that would allow cpus in a common cpuset to share arbitrary state data, but yet were associated with the sched-domains that the cpuset code setup. The first pass had the structures associated with the sched-domain hierarchy, but I soon realized that it was really a per-rq association so I could simplify the design. I.e.. Rather than have the code walk the sched-domain to find the common "root", I just hung the root directly on the rq itself.
But anyway, to answer the question: The concept is meant to be generic. For instance, if it made sense for Peters cgroup work to sit here as well, we could just add new fields to the struct root_domain and Peter could access them via rq->rd.
I realize that it could possibly have been designed to abstract away the type of objects that the root-domain manages, but I want to keep the initial code as simple as possible. We can always complicate^h^h^h^h^hcleanup the code later ;)
Regards,
-Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists