[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240529070947.4zxcdnu32d2u7cny@vireshk-i7>
Date: Wed, 29 May 2024 12:39:47 +0530
From: Viresh Kumar <viresh.kumar@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: Questions about transition latency and LATENCY_MULTIPLIER
HI Qais,
On 28-05-24, 02:21, Qais Yousef wrote:
> Hi
>
> I am trying to understanding the reason behind the usage of LATENCY_MULTIPLIER
> to create transition_delay_us. It is set to 1000 by default and when I tried to
> dig into the history I couldn't reach the original commit as the code has gone
> through many transformations and I gave up finding the first commit that
> introduced it.
The changes came along with the initial commits for conservative and ondemand
governors, i.e. before 2005.
> Generally I am seeing that rate_limit_us in schedutil (which is largely
> influenced by this multiplier on most/all systems I am working on) is too high
> compared to the cpuinfo_transition_latency reported by the driver
>
> For example on my M1 mac mini I get 50 and 56us. rate_limit_us is 10ms (on 6.8
> kernel, should become 2ms after my fix)
>
> $ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:50000
> /sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:56000
>
> AMD Ryzen it reads 0, and end up with LATENCY_MULTIPLIER (1000 = 1ms) as
> the rate_limit_us.
>
> On Intel I5 I get 20us but rate_limit is 5ms which is requested explicitly by
> intel_pstate driver
>
> $ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy1/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy2/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy3/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy5/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy6/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy7/cpuinfo_transition_latency:20000
>
> The question I have is that why so high? If hardware got so good, why can't we
> leverage the hardware's fast ability to change frequencies more often?
>From my understanding, this is about not changing the frequency too often.
That's all. And it was historical and probably we didn't get better numbers with
this reduced to a lower value later on as well.
> This is important because due to uclamp usage, we can end up with less gradual
> transition between frequencies and we can jump up and down more often. And the
> smaller this value is, this means the better we can handle fast transition to
> boost or cap frequencies based on task's requirements when it context switches.
> But the rate limit generally is too high for the hardware and wanted to
> understand if this is pure historical or we still have reasons to worry about?
Maybe Rafael knows other reasons, but this is all I remember.
> From what I've seen so far, it seems to me this higher rate limit is helping
> performance as bursty tasks are more likely to find the CPU running at higher
> frequencies due to this behavior. I think this is something I can help these
> bursty tasks with without relying accidentally on this being higher.
>
> Is there any worry on using cpuinfo_transition_latency as is if the driver
> doesn't provide transition_delay_us?
Won't we keep changing the frequency continuously in that case ? Or am I
misunderstanding something ?
> And does the kernel/driver contract need to cater for errors in driver's
> ability to serve the request? Can our request silently be ignored by the
> hardware?
cpufreq core maintains its state machine and the failures are used to inform the
user and / or stop DVFS. It is useful for a clean approach, not sure what we
will get / miss by ignoring the errors..
> Not necessarily due to rate limit being ignored, but for any other
> reason? It is important for Linux to know what frequency we're actually running
> at.
One is that we report to userspace two frequencies:
- scaling_cur_freq: The frequency that the software thinks the hardware runs at
(last requested freq i.e.)
- cpuinfo_cur_freq: The real frequency hardware is running at. Can be calculated
using counters, etc.
And there will be tools which are using them. So these are required.
> Some hardware gives the ability to read a counter to discover that. But
> a lot of systems rely on the fact that the request we sent is actually
> honoured. But failures can mean things like EAS will misbehave.
--
viresh
Powered by blists - more mailing lists