[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1337468148.573.139.camel@twins>
Date: Sun, 20 May 2012 00:55:48 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
paulmck@...ux.vnet.ibm.com, smuckle@...cinc.com, khilman@...com,
Robin.Randhawa@....com, suresh.b.siddha@...el.com,
thebigcorporation@...il.com, venki@...gle.com,
panto@...oniou-consulting.com, mingo@...e.hu, paul.brett@...el.com,
pdeschrijver@...dia.com, pjt@...gle.com, efault@....de,
fweisbec@...il.com, geoff@...radead.org, rostedt@...dmis.org,
tglx@...utronix.de, amit.kucheria@...aro.org,
linux-kernel <linux-kernel@...r.kernel.org>,
linaro-sched-sig@...ts.linaro.org,
Morten Rasmussen <Morten.Rasmussen@....com>,
Juri Lelli <juri.lelli@...il.com>
Subject: Re: Plumbers: Tweaking scheduler policy micro-conf RFP
On Sat, 2012-05-19 at 10:08 -0700, Linus Torvalds wrote:
> Ingo, please don't take any of these patches if they are starting to
> make NUMA scheduling be some arch-specific crap.
I think there's a big mis-understanding here. I fully 100% agree with
you on that. And this thread in particular isn't about NUMA at all.
This thread is about modifying the arch interface of describing the
chip.
The current interface is we have 4 fixed topology domains:
SMT
MC
BOOK
CPU
(and the NUMA stuff comes on top of that and I just removed arch bits
from that, so lets leave that for now).
The first 3 domains depend on CONFIG_SCHED_{SMT,MC,BOOK} resp. and if an
architecture select one of those it will have to provide a function
cpu_{smt,coregroup,book}_mask and optionally put a struct sched_domain
initializer in their asm/topology.h.
Now I've had quite a few complaints from arch maintainers that the
sched_domain initializer is a far too unwieldy interface to fill out and
I quite agree with them.
Now all I've meant to propose in this thread is to replace the entire
above with a simpler interface.
Instead of the above all I'm asking of doing is providing something
along the lines of:
struct sched_topology arch_topology[] = {
{ cpu_smt_mask, ST_SMT },
{ cpu_llc_mask, ST_CACHE },
{ cpu_socket_mask, ST_SOCKET },
{ NULL, },
};
and that's just about all an arch would need to do.
That said, there are a few new things in ARM land like the big-little
stuff that have no direct relation to anything on the x86 side. And they
would very much like to have means of describing their chip topology as
well.
About power aware scheduling, yes its all a big mess and the current
stuff is horrid and broken.
That said, I do believe we can do better than nothing about it, and I'm
really not asking for anything perfect -- in fact I'm asking for pretty
much the same thing you are, something simple and understandable.
The simple pack stuff on a minimum amount of power-gated units instead
of spreading it out should get some benefit. For this we'd need to know
at what granularity a chip can power-gate.
> I'm very very serious about this. Try to make the scheduler have a
> *simple* model that people can actually understand. For example, maybe
> it can literally be a multi-level balancing thing, where the per-cpu
> runqueues are grouped into a "shared core resources" balancer that
> balances within the SMT or shared-L2 domain. And then there's an
> upper-level balancer (that runs much more seldom) that is written to
> balances within the socket. And then one that balances within the
> node/board. And finally one that balances across boards.
That is basically how the scheduler is set up. These are the
sched_domains.
There is an awful lot of complexity in that code though, and I've been
trying to clean some of that up but its very slow going.
The purpose of this thread is to both simplify and allow people to more
easily express what they really care about. For this we need to explore
the problem space.
I know I haven't replied to all your points, and I suspect many are
related to annoyances you might have from other threads and I shall
attempt to answer them later.
I do feel bad that I've managed to annoy you to such a degree though. I
really would rather have a much simpler load-balancer too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists