linux-kernel - Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.11.1401311236470.2312@knanqh.ubzr>
Date:	Fri, 31 Jan 2014 13:19:26 -0500 (EST)
From:	Nicolas Pitre <nicolas.pitre@...aro.org>
To:	Arjan van de Ven <arjan@...ux.intel.com>
cc:	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Len Brown <len.brown@...el.com>,
	Preeti Murthy <preeti.lkml@...il.com>, mingo@...hat.com,
	Thomas Gleixner <tglx@...utronix.de>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	Lists linaro-kernel <linaro-kernel@...ts.linaro.org>
Subject: Re: [RFC PATCH 3/3] idle: store the idle state index in the struct
 rq

On Fri, 31 Jan 2014, Arjan van de Ven wrote:

> On 1/31/2014 7:37 AM, Daniel Lezcano wrote:
> > On 01/31/2014 04:07 PM, Arjan van de Ven wrote:
> > > > > >
> > > > > > Hence I think this patch would make sense only with additional
> > > > > > information
> > > > > > like exit_latency or target_residency is present for the scheduler.
> > > > > > The idle
> > > > > > state index alone will not be sufficient.
> > > > >
> > > > > Alternatively, can we enforce sanity on the cpuidle infrastructure to
> > > > > make the index naturally ordered? If not, please explain why :-)
> > > >
> > > > The commit id 71abbbf856a0e70 says that there are SOCs which could have
> > > > their target_residency and exit_latency values change at runtime. This
> > > > commit thus removed the ordering of the idle states according to their
> > > > target_residency/exit_latency. Adding Len and Arjan to the CC.
> > >
> > > the ARM folks wanted a dynamic exit latency, so.... it makes much more
> > > sense
> > > to me to store the thing you want to use (exit latency) than the number
> > > of the state.
> > >
> > > more than that, you can order either by target residency OR by exit
> > > latency,
> > > if you sort by one, there is no guarantee that you're also sorted by the
> > > other
> >
> > IMO, it would be preferable to store the index for the moment as we are
> > integrating cpuidle with the scheduler. The index allows to access more
> > informations. Then when
> > everything is fully integrated we can improve the result, no ?
> 
> more information, yes. but if the information isn't actually accurate (because
> it keeps changing
> in the datastructure away from what it was for the cpu)... are you really
> achieving what you want?

Right now (on ARM at least but I imagine this is pretty universal), the 
biggest impact on information accuracy for a CPU depends on what the 
other CPUs are doing.  The most obvious example is cluster power down.  
For a cluster to be powered down, all the CPUs sharing this cluster must 
also be powered down.  And all those CPUs must have agreed to a possible 
cluster power down in advance as well.  But it is not because an idle 
CPU has agreed to the extra latency imposed by a cluster power down that 
the cluster has actually powered down since another CPU in that cluster 
might still be running, in which case the recorded latency information 
for that idle CPU would be higher than it would be in practice at that 
moment.

A cluster should map naturally to a scheduling domain.  If we need to 
wake up a CPU, it is quite obvious that we should prefer an idle CPU 
from a scheduling domain which load is not zero.  If the load is not 
zero then this means that any idle CPU in that domain, even if it 
indicated it was ready for a cluster power down, will not require the 
cluster power-up latency as some other CPUs must still be running.  But 
we already know that of course even if the recorded latency might not 
say so.

In other words, the hardware latency information is dynamic of course.  
But we might not _need_ to have it reflected at the scheduler domain all 
the time as in this case it can be inferred by the scheduling domain 
load.

Within a scheduling domain it is OK to pick up the best idle CPU by 
looking at the index as it is best to leave those CPUs ready for a 
cluster power down set to that state and prefer one which is not.  And a 
scheduling domain with a load of zero should be left alone if idle CPUs 
are found in another domain which load is not zero, irrespective of 
absolute latency information. So all the existing heuristics already in 
place to optimize cache utilization and so on will make things just work 
for idle as well.

All this to say that it is not justified at the moment to worry about 
how to convey the full details to the scheduler and the complexity that 
goes with it since in practice we might be able to achieve our goal just 
as well using simpler hints like some arbitrary index.  Once this is in 
place, then we could look at the actual benefits from having more 
detailed information and weight that against the complexity that comes 
with it.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/