linux-kernel - Re: [PATCH] cpuidle: menu: use nr_running instead of cpuload for calculating perf mult

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1338830167.28282.115.camel@twins>
Date:	Mon, 04 Jun 2012 19:16:07 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Arjan van de Ven <arjan@...ux.intel.com>
Cc:	Vladimir Davydov <vdavydov@...allels.com>,
	Ingo Molnar <mingo@...e.hu>, Len Brown <lenb@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpuidle: menu: use nr_running instead of cpuload for
 calculating perf mult

On Mon, 2012-06-04 at 09:53 -0700, Arjan van de Ven wrote:
> > 
> > False, you can have 0 idle time and still have low load.
> 
> 1 is not low in this context fwiw.

I think you're mis-understanding the load number you're using. I suspect
you're expecting something like the load-avg top/uptime provide. You're
very much not using anything similar. 

Nor do we compute anything like that, and I want to avoid having to
compute anything like that because its expensive.

> >>  but because idle
> >> time tends to be bursty, we can still be idle for, say, a millisecond
> >> every 10 milliseconds. In this scenario, the load average is used to
> >> ensure that the 200 usecond cost of exiting idle is acceptable.
> > 
> > So what you're saying is that if you have 1ms idle in 10ms, it might not
> > be a continuous 1ms. And you're using load as a measure of how many
> > fragments it comes apart in?
> 
> no
> 
> what I'm saying is that if you have a workload where you have 10 msec of
> work, then 1 msec of idle, then 10 msec of work, 1 msec of idle etc etc,
> it is very different from 100 msec of work, 10 msec of idle, 100 msec of
> work, even though utilization is the same.

Sure..

> what the logic is trying to do, on a 10 km level, is to limit the damage
> of accumulated C state exit time.
> (I'll avoid the word "latency" here, since the real time people will
> then immediately think this is about controlling latency response, which
> it isn't)

But why? There's a natural limit to his, say the wakeup costs 0.2ms then
you can only do 5k of those a second. Once you need to actually do some
work as well this comes down.

But its all idle time, you cannot be idle longer than there is a lack of
work. So if you're idle too long (because of long exit latency) your
work shifts and the future idle time reduces, eventually causing a lower
C state to be used.

Also, when you notice you're waking up too soon, you can quickly ramp
down on the C state levels.

> Now, if you're very idle for a sustained duration (e.g. low load),
> you're assumed not sensitive to a bit of performance cost.
> but if you're actually busy (over a longer period, not just "right
> now"), you're assumed to be sensitive to the performance cost,
> and what the algorithm does is make it less easy to go into the
> expensive states.

My brain still sparks and fizzles when I read that.. it just doesn't
compute.

What performance? performance isn't a well defined word.

> the closest metric we have right now to "sensitive to performance cost"
> that I know of is "load average". If the scheduler has a better metric,
> I'd be more than happy to switch the idle selection code over to it...

I can't suggest anything better for something I've still no clue about.
You're completely failing to explain this thing to me.

> note that the idle selection code has 3 metrics, this is only one of them:
> 1. PM_QOS latency tolerance
> 2. Energy break even
> 3. Performance tolerance

That 3rd, I'm completely failing to understand.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/