[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1201608457.28547.130.camel@lappy>
Date: Tue, 29 Jan 2008 13:07:37 +0100
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Paul Jackson <pj@....com>
Cc: linux-kernel@...r.kernel.org, mingo@...e.hu,
vatsa@...ux.vnet.ibm.com, dhaval@...ux.vnet.ibm.com,
nickpiggin@...oo.com.au, ebiederm@...ssion.com,
akpm@...ux-foundation.org, sgrubb@...hat.com, rostedt@...dmis.org,
ghaskins@...ell.com, dmitry.adamushko@...il.com,
tong.n.li@...el.com, tglx@...utronix.de, menage@...gle.com,
rientjes@...gle.com
Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing
On Tue, 2008-01-29 at 05:53 -0600, Paul Jackson wrote:
> Peter wrote;
> > So, I don't think we need that, I think we can do with the single flag,
> > we just need to find these disjoint sets and stick our rt-domain there.
>
> Ah - perhaps you don't need that flag - but my other cpuset users do ;).
>
> You see, there are two very different ways that 'sched_load_balance' is
> used in practice.
>
> The other way is by big batch schedulers. They may be placed in charge
> of managing a few hundred CPUs on a system, and might be running a mix
> of many small jobs each covering only a few CPUs. They routinely setup
> one cpuset for each job, to contain that job to the CPUs and memory
> nodes assigned to it. This is actually the original motivating use for
> cpusets.
>
> As a bit of optimization, batch schedulers desire to tell the normal
> kernel scheduler -not- to bother load balancing across the big set of
> CPUs controlled by the batch scheduler, but only to load balance within
> each of the smaller per-job cpusets. Load balancing across hundreds
> of CPUs when the batch scheduler knows such efforts would be fruitless
> is a waste of good CPU cycles in kernel/sched.c.
>
> I really doubt we'd want to have such systems triggering the hard RT
> scheduler on whatever CPUs were in the batch schedulers big cpuset
> that didn't happened to have an active job currently assigned to them.
My turn to be confused..
If SD_LOAD_BALANCE is only set on the smaller, per-job, sets, how will
the RT balancer trigger on the large set?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists