linux-kernel - Re: [5/11] issue 5: Frequency and uarch invariant task load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140108123118.GS30183@twins.programming.kicks-ass.net>
Date:	Wed, 8 Jan 2014 13:31:18 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Morten Rasmussen <morten.rasmussen@....com>
Cc:	mingo@...nel.org, rjw@...ysocki.net, markgross@...gnar.org,
	vincent.guittot@...aro.org, catalin.marinas@....com,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [5/11] issue 5: Frequency and uarch invariant task load

On Tue, Jan 07, 2014 at 04:19:41PM +0000, Morten Rasmussen wrote:
> Potential solution: Frequency invariance has been proposed before [1]
> where the task load is scaled by the cur/max freq ratio. Another
> possibility is to use hardware counters if such are available on the
> platform.
> 
> [1] https://lkml.org/lkml/2013/4/16/289

Right, I just had a look at those patches.. they're not horrible but I
think they're missing a few opportunities.

My main objection to them is that I think the newly introduced
max_capacity is exactly what the current cpu_power thing is -- then
again, I still haven't let the entire thing sink in well enough.

Not to mention we need to fix some of the cpu_power abuse -- like the
correlation to capacity, which as stated in previous emails should be
sorted using utilization.

So DVFS certainly makes sense, and would indeed be required in order to
make sensible decisions in the face of P states. Even in the face of
funny hardware like Intel which pretty much ignores whatever you tell it
and does it own merry thing.

A few random thoughts:

 - I think for SMP-nice we want to migrate from /max_capacity to
   /curr_capacity; because SMP-nice cares about 100% utilization
   regardless of the actual P state. If we're somehow forced into a
   lower P state (thermal or otherwise) fairness is best served by
   normalizing at the rate we're actually running at, not the potential
   maximal.

 - We need to re-think SMT and turbo-bins in general; I think we can
   think of those two as the same effective thing. This does mean Intel
   chips will have a dual layer of this goo, and we can currently barely
   deal with the 1 SMT layer, let alone do something sensible with 2.

   To clarify, a single SMT thread will generally go 'faster' on its own
   since it doesn't need to compete with the other thread(s) for core
   resources, but together they might better utilize the core resources
   giving an over-all throughput win.

   Similar for turbo bins, a single core can go faster on its own since
   it doesn't have competition for energy and thermal constraints, but
   together cores can probably achieve greater throughput.

   So we need a better way to describe this capacity dependency and
   variability.

   I'm fairly sure ARM doesn't do SMT, but they certainly suffer from
   thermal caps and can thus have effective turbo bins, even though
   they're not explicit and magic like with Intel.

   And of course the honorary mention goes to Power7 which has
   asymmetric bins -- lets hope they fix it and nobody else things them
   a great idea.

 - For hardware without P state controls, or hardware that pretty much
   ignores them, we need means of obtaining the max and curr capacity.

   Intel has the APERF, MPERF registers which resp. count at actual
   frequency and fixed frequency. Using them is a bit tricky since
   APERF doesn't count when idle, but when filtering out the idle time
   they do provide a current performance ratio.

   From that we could obtain a max performance ratio by using a wide
   window max on the current value or somesuch.

   Again, SMT and turbo-bins will complicate matters..

   Other CPUs that have magic P state control might not provide such
   registers which would require PMU resources, which would completely
   blow :/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/