linux-kernel - Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.11.1402030938440.2312@knanqh.ubzr>
Date:	Mon, 3 Feb 2014 09:58:12 -0500 (EST)
From:	Nicolas Pitre <nicolas.pitre@...aro.org>
To:	Morten Rasmussen <morten.rasmussen@....com>
cc:	Arjan van de Ven <arjan@...ux.intel.com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Len Brown <len.brown@...el.com>,
	Preeti Murthy <preeti.lkml@...il.com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	Lists linaro-kernel <linaro-kernel@...ts.linaro.org>
Subject: Re: [RFC PATCH 3/3] idle: store the idle state index in the struct
 rq

On Mon, 3 Feb 2014, Morten Rasmussen wrote:

> On Fri, Jan 31, 2014 at 06:19:26PM +0000, Nicolas Pitre wrote:
> > A cluster should map naturally to a scheduling domain.  If we need to 
> > wake up a CPU, it is quite obvious that we should prefer an idle CPU 
> > from a scheduling domain which load is not zero.  If the load is not 
> > zero then this means that any idle CPU in that domain, even if it 
> > indicated it was ready for a cluster power down, will not require the 
> > cluster power-up latency as some other CPUs must still be running.  But 
> > we already know that of course even if the recorded latency might not 
> > say so.
> > 
> > In other words, the hardware latency information is dynamic of course.  
> > But we might not _need_ to have it reflected at the scheduler domain all 
> > the time as in this case it can be inferred by the scheduling domain 
> > load.
> 
> I agree that the existing sched domain hierarchy should be used to
> represent the power topology. But, it is not clear to me how much we can say
> about the C-state of cpu without checking the load of the entire cluster
> every time?
> 
> We would need to know which C-states (index) that are per cpu and per
> cluster and ignore the cluster states when the cluster load is non-zero.

In any case i.e. whether the cluster load is zero or not, we want to 
select the CPU to wake up with the shallowest C-state.  That should 
correspond to the actual cluster C-state already without having to track 
it explicitly.

> Current sched domain load is not maintained in the scheduler, it is only
> produced when needed. But I guess you could derive the necessary
> information from the idle cpu masks.

Even better.

> > Within a scheduling domain it is OK to pick up the best idle CPU by 
> > looking at the index as it is best to leave those CPUs ready for a 
> > cluster power down set to that state and prefer one which is not.  And a 
> > scheduling domain with a load of zero should be left alone if idle CPUs 
> > are found in another domain which load is not zero, irrespective of 
> > absolute latency information. So all the existing heuristics already in 
> > place to optimize cache utilization and so on will make things just work 
> > for idle as well.
> 
> IIUC, you propose to only use the index when picking an idle cpu inside
> an already busy sched domain and leave idle sched domains alone if
> possible. It may work for homogeneous SMP systems, but I don't think it
> will work for heterogeneous systems like big.LITTLE.

Hence the caveat "everything else being equal" I said previously.

> If the little cluster has zero load and the big has stuff running, it
> doesn't mean that it is a good idea to wake up another big cpu. It may
> be more power efficient to wake up the little cluster. Comparing idle
> state index of a big and little cpu won't help us in making that choice
> as the clusters may have different idle states and the costs associated
> with each state are different.

Agreed.  But let's evolve this in manageable steps.

> I'm therefore not convinced that idle state index is the right thing to
> give the scheduler. Using a cost metric would be better in my
> opinion.

That won't be difficult to move from the idle state index to some other 
cost metric once we've proven the simple index on homogeneous systems 
has benefits.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/