linux-kernel - Re: Questions about transition latency and LATENCY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240529070947.4zxcdnu32d2u7cny@vireshk-i7>
Date: Wed, 29 May 2024 12:39:47 +0530
From: Viresh Kumar <viresh.kumar@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: Questions about transition latency and LATENCY_MULTIPLIER

HI Qais,

On 28-05-24, 02:21, Qais Yousef wrote:
> Hi
> 
> I am trying to understanding the reason behind the usage of LATENCY_MULTIPLIER
> to create transition_delay_us. It is set to 1000 by default and when I tried to
> dig into the history I couldn't reach the original commit as the code has gone
> through many transformations and I gave up finding the first commit that
> introduced it.

The changes came along with the initial commits for conservative and ondemand
governors, i.e. before 2005.

> Generally I am seeing that rate_limit_us in schedutil (which is largely
> influenced by this multiplier on most/all systems I am working on) is too high
> compared to the cpuinfo_transition_latency reported by the driver
> 
> For example on my M1 mac mini I get 50 and 56us. rate_limit_us is 10ms (on 6.8
> kernel, should become 2ms after my fix)
> 
> 	$ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
> 	/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:50000
> 	/sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:56000
> 
> AMD Ryzen it reads 0, and end up with LATENCY_MULTIPLIER (1000 = 1ms) as
> the rate_limit_us.
> 
> On Intel I5 I get 20us but rate_limit is 5ms which is requested explicitly by
> intel_pstate driver
> 
> 	$ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
> 	/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy1/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy2/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy3/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy5/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy6/cpuinfo_transition_latency:20000
> 	/sys/devices/system/cpu/cpufreq/policy7/cpuinfo_transition_latency:20000
> 
> The question I have is that why so high? If hardware got so good, why can't we
> leverage the hardware's fast ability to change frequencies more often?

>From my understanding, this is about not changing the frequency too often.
That's all. And it was historical and probably we didn't get better numbers with
this reduced to a lower value later on as well.

> This is important because due to uclamp usage, we can end up with less gradual
> transition between frequencies and we can jump up and down more often. And the
> smaller this value is, this means the better we can handle fast transition to
> boost or cap frequencies based on task's requirements when it context switches.
> But the rate limit generally is too high for the hardware and wanted to
> understand if this is pure historical or we still have reasons to worry about?

Maybe Rafael knows other reasons, but this is all I remember.

> From what I've seen so far, it seems to me this higher rate limit is helping
> performance as bursty tasks are more likely to find the CPU running at higher
> frequencies due to this behavior. I think this is something I can help these
> bursty tasks with without relying accidentally on this being higher.
> 
> Is there any worry on using cpuinfo_transition_latency as is if the driver
> doesn't provide transition_delay_us?

Won't we keep changing the frequency continuously in that case ? Or am I
misunderstanding something ?

> And does the kernel/driver contract need to cater for errors in driver's
> ability to serve the request? Can our request silently be ignored by the
> hardware?

cpufreq core maintains its state machine and the failures are used to inform the
user and / or stop DVFS. It is useful for a clean approach, not sure what we
will get / miss by ignoring the errors..

> Not necessarily due to rate limit being ignored, but for any other
> reason? It is important for Linux to know what frequency we're actually running
> at.

One is that we report to userspace two frequencies:
- scaling_cur_freq: The frequency that the software thinks the hardware runs at
  (last requested freq i.e.)

- cpuinfo_cur_freq: The real frequency hardware is running at. Can be calculated
  using counters, etc.

And there will be tools which are using them. So these are required.

> Some hardware gives the ability to read a counter to discover that. But
> a lot of systems rely on the fact that the request we sent is actually
> honoured. But failures can mean things like EAS will misbehave.

-- 
viresh