[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1345041802.31459.94.camel@twins>
Date: Wed, 15 Aug 2012 16:43:22 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Borislav Petkov <bp@...en8.de>
Cc: Alex Shi <alex.shi@...el.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Arjan van de Ven <arjan@...ux.intel.com>,
vincent.guittot@...aro.org, svaidy@...ux.vnet.ibm.com,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Paul Turner <pjt@...gle.com>
Subject: Re: [discussion]sched: a rough proposal to enable power saving in
scheduler
On Wed, 2012-08-15 at 15:15 +0200, Borislav Petkov wrote:
> On Wed, Aug 15, 2012 at 01:05:38PM +0200, Peter Zijlstra wrote:
> > On Mon, 2012-08-13 at 20:21 +0800, Alex Shi wrote:
> > > Since there is no power saving consideration in scheduler CFS, I has a
> > > very rough idea for enabling a new power saving schema in CFS.
> >
> > Adding Thomas, he always delights poking holes in power schemes.
> >
> > > It bases on the following assumption:
> > > 1, If there are many task crowd in system, just let few domain cpus
> > > running and let other cpus idle can not save power. Let all cpu take the
> > > load, finish tasks early, and then get into idle. will save more power
> > > and have better user experience.
> >
> > I'm not sure this is a valid assumption. I've had it explained to me by
> > various people that race-to-idle isn't always the best thing. It has to
> > do with the cost of switching power states and the duration of execution
> > and other such things.
>
> I think what he means here is that we might want to let all cores on
> the node (i.e., domain) finish and then power down the whole node which
> should bring much more power savings than letting a subset of the cores
> idle. Alex?
Sure we can do that.
> > So I'd leave the currently implemented scheme as performance, and I
> > don't think the above describes the current state.
> >
> > > } else if (schedule policy == power)
> > > move tasks from busiest group to
> > > idlest group until busiest is just full
> > > of capacity.
> > > //the busiest group can balance
> > > //internally after next time LB,
> >
> > There's another thing we need to do, and that is collect tasks in a
> > minimal amount of power domains.
>
> Yep.
>
> Btw, what heuristic would tell here when a domain overflows and another
> needs to get woken? Combined load of the whole domain?
>
> And if I absolutely positively don't want a node to wake up, do I
> hotplug its cores off or are we going to have a way to tell the
> scheduler to overcommit the non-idle domains and spread the tasks only
> among them.
>
> I'm thinking of short bursts here where it would be probably beneficial
> to let the tasks rather wait runnable for a while then wake up the next
> node and waste power...
I was thinking of a utilization measure made of per-task weighted
runnable averages. This should indeed cover that case and we'll overflow
when on average there is no (significant) idle time over a period longer
than the averaging period.
Anyway, I'm not too set on this and I'm very sure we can tweak this ad
infinitum, so starting with something relatively simple that works for
most is preferred.
As already stated, I think some of the Linaro people actually played
around with something like this based on PJTs patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists