[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c92071a002c16120b424c4057056a05.squirrel@webmail.greenhost.nl>
Date: Tue, 10 Jan 2012 03:12:29 +0100
From: "Indan Zupancic" <indan@....nu>
To: "Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Cc: "Vincent Guittot" <vincent.guittot@...aro.org>,
"Youquan Song" <youquan.song@...el.com>,
linux-kernel@...r.kernel.org, mingo@...e.hu, tglx@...utronix.de,
hpa@...or.com, akpm@...ux-foundation.org, stable@...r.kernel.org,
suresh.b.siddha@...el.com, arjan@...ux.intel.com,
len.brown@...el.com, anhua.xu@...el.com
Subject: Re: [PATCH] x86,sched: Fix sched_smt_power_savings totally broken
Hello,
On Mon, January 9, 2012 15:46, Peter Zijlstra wrote:
> On Mon, 2012-01-09 at 15:29 +0100, Vincent Guittot wrote:
>
>> I'm also using sched_mc level for doing powersaving load balance on
>> ARM platform and we have real benefits.
>
> Right, I've never said that power aware balancing was without merit, I
> know it matters (quite a lot for some).
>
>> We might modify the way we choose between power or performance mode
>> because it's not always a matter of gathering or spreading tasks on
>> cpus but until we found a best interface it's the way to enable
>> powersaving mode
>
> Sure, it was the only interface available.
>
> But I really want to get rid of the topology based knobs we have now,
> preferably I even want to get rid of the multi-value thing.
>
> pjt still needs to post his linsched rework *poke* *poke*, that should
> give a good basis to rework most of this without regressing the world
> and then some.
>
> But even without that I think we can do better than we do currently.
Perhaps it should be controlled by the CPU governor instead of being
a separate knob? Because in the end it is about power saving and that
is the CPU governor's role. So to me it seems that at least the policy
should come from/via the CPU governor. It seems right that there is one
place that controls both frequencies and active core count.
I guess what happens now is that powersaving aware scheduling causes a
core to become idle, with the CPU governor detecting that and putting
the core to sleep. Maybe it should be the other way round, with the
governor detecting that not all cores are needed and putting one to
sleep?
This way the scheduler can schedule optimally over the available cores
for maximum performance. For some hardware it may also be much better
to have multiple active cores at low voltage and frequency instead of
fewer cores at high frequency, especially if more cache is available.
To pick the right policy automatically this kind of information has to
be known, and currently such kind of information seems to come from the
cpuidle/cpufreq drivers.
Having just one knob which chooses between gathering or spreading tasks
is a good start. But in the end it seems core idling should be controlled
automatically depending on the hardware and circumstances by something
like a CPU governor. It probably should also be aware of IRQ balancing
and making sure idle cores don't get any interrupts while there are active
cores that could handle them.
I know people weren't too happy with the current implementation of the
CPU governors either. Didn't someone implement a proof of concept version
of an improved governor a while ago? (Arjan?)
Greetings,
Indan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists