[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c55339cd-85d6-4777-beec-41c4d9931b9a@arm.com>
Date: Tue, 17 Sep 2024 00:22:15 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate
response time
On 20/08/2024 18:35, Qais Yousef wrote:
> The new tunable, response_time_ms, allow us to speed up or slow down
> the response time of the policy to meet the perf, power and thermal
> characteristic desired by the user/sysadmin. There's no single universal
> trade-off that we can apply for all systems even if they use the same
> SoC. The form factor of the system, the dominant use case, and in case
> of battery powered systems, the size of the battery and presence or
> absence of active cooling can play a big role on what would be best to
> use.
>
> The new tunable provides sensible defaults, but yet gives the power to
> control the response time to the user/sysadmin, if they wish to.
>
> This tunable is applied before we apply the DVFS headroom.
>
> The default behavior of applying 1.25 headroom can be re-instated easily
> now. But we continue to keep the min required headroom to overcome
> hardware limitation in its speed to change DVFS. And any additional
> headroom to speed things up must be applied by userspace to match their
> expectation for best perf/watt as it dictates a type of policy that will
> be better for some systems, but worse for others.
>
> There's a whitespace clean up included in sugov_start().
>
> Signed-off-by: Qais Yousef <qyousef@...alina.io>
> ---
> Documentation/admin-guide/pm/cpufreq.rst | 17 +++-
> drivers/cpufreq/cpufreq.c | 4 +-
> include/linux/cpufreq.h | 3 +
> kernel/sched/cpufreq_schedutil.c | 115 ++++++++++++++++++++++-
> 4 files changed, 132 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> index 6adb7988e0eb..fa0d602a920e 100644
> --- a/Documentation/admin-guide/pm/cpufreq.rst
> +++ b/Documentation/admin-guide/pm/cpufreq.rst
> @@ -417,7 +417,7 @@ is passed by the scheduler to the governor callback which causes the frequency
> to go up to the allowed maximum immediately and then draw back to the value
> returned by the above formula over time.
>
> -This governor exposes only one tunable:
> +This governor exposes two tunables:
>
> ``rate_limit_us``
> Minimum time (in microseconds) that has to pass between two consecutive
> @@ -427,6 +427,21 @@ This governor exposes only one tunable:
> The purpose of this tunable is to reduce the scheduler context overhead
> of the governor which might be excessive without it.
>
> +``respone_time_ms``
> + Amount of time (in milliseconds) required to ramp the policy from
> + lowest to highest frequency. Can be decreased to speed up the
^^^^^^^^^^^^^^^^^
This has changed IMHO. Should be the time from lowest (or better 0) to
second highest frequency.
https://lkml.kernel.org/r/20230827233203.1315953-6-qyousef@layalina.io
[...]
> @@ -59,6 +63,70 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
>
> /************************ Governor internals ***********************/
>
> +static inline u64 sugov_calc_freq_response_ms(struct sugov_policy *sg_policy)
> +{
> + int cpu = cpumask_first(sg_policy->policy->cpus);
> + unsigned long cap = arch_scale_cpu_capacity(cpu);
> + unsigned int max_freq, sec_max_freq;
> +
> + max_freq = sg_policy->policy->cpuinfo.max_freq;
> + sec_max_freq = __resolve_freq(sg_policy->policy,
> + max_freq - 1,
> + CPUFREQ_RELATION_H);
> +
> + /*
> + * We will request max_freq as soon as util crosses the capacity at
> + * second highest frequency. So effectively our response time is the
> + * util at which we cross the cap@..._highest_freq.
> + */
> + cap = sec_max_freq * cap / max_freq;
> +
> + return approximate_runtime(cap + 1);
> +}
Still uses the CPU capacity value based on dt-entry
capacity-dmips-mhz = <578> (CPU0 on juno-r0)
^^^
i.e. frequency invariance is not considered.
[ 1.943356] CPU0 max_freq=850000 sec_max_freq=775000 cap=578 cap_at_sec_max_opp=527 runtime=34
^^^^^^^
[ 1.957593] CPU1 max_freq=1100000 sec_max_freq=950000 cap=1024 cap_at_sec_max_opp=884 runtime=92
# cat /sys/devices/system/cpu/cpu*/cpu_capacity
446
^^^
1024
1024
446
446
446
[...]
Powered by blists - more mailing lists