linux-kernel - Re: [PATCH] cpufreq: sched/schedutil: Remove LATENCY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jK0Y2wcgW9uqqAACkHNu5RB7RWky=TJC-RnBrdd8yO5A@mail.gmail.com>
Date: Fri, 2 Aug 2024 15:58:51 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Viresh Kumar <viresh.kumar@...aro.org>, 
	Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>, 
	Vincent Guittot <vincent.guittot@...aro.org>, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpufreq: sched/schedutil: Remove LATENCY_MULTIPLIER

On Sun, Jul 28, 2024 at 9:27 PM Qais Yousef <qyousef@...alina.io> wrote:
>
> The current LATENCY_MULTIPLIER which has been around for nearly 20 years
> causes rate_limit_us to be always in ms range.
>
> On M1 mac mini I get 50 and 56us transition latency, but due to the 1000
> multiplier we end up setting rate_limit_us to 50 and 56ms, which gets
> capped into 2ms and was 10ms before e13aa799c2a6 ("cpufreq: Change
> default transition delay to 2ms")
>
> On Intel I5 system transition latency is 20us but due to the multiplier
> we end up with 20ms that again is capped to 2ms.
>
> Given how good modern hardware and how modern workloads require systems
> to be more responsive to cater for sudden changes in workload (tasks
> sleeping/wakeup/migrating, uclamp causing a sudden boost or cap) and
> that 2ms is quarter of the time of 120Hz refresh rate system, drop the
> old logic in favour of providing 50% headroom.
>
>         rate_limit_us = 1.5 * latency.
>
> I considered not adding any headroom which could mean that we can end up
> with infinite back-to-back requests.
>
> I also considered providing a constant headroom (e.g: 100us) assuming
> that any h/w or f/w dealing with the request shouldn't require a large
> headroom when transition_latency is actually high.
>
> But for both cases I wasn't sure if h/w or f/w can end up being
> overwhelmed dealing with the freq requests in a potentially busy system.
> So I opted for providing 50% breathing room.
>
> This is expected to impact schedutil only as the other user,
> dbs_governor, takes the max(2*tick, transition_delay_us) and the former
> was at least 2ms on 1ms TICK, which is equivalent to the max_delay_us
> before applying this patch. For systems with TICK of 4ms, this value
> would have almost always ended up with 8ms sampling rate.
>
> For systems that report 0 transition latency, we still default to
> returning 1ms as transition delay.
>
> This helps in eliminating a source of latency for applying requests as
> mentioned in [1]. For example if we have a 1ms tick, most systems will
> miss sending an update at tick when updating the util_avg for a task/CPU
> (rate_limit_us will be 2ms for most systems).
>
> [1] https://lore.kernel.org/lkml/20240724212255.mfr2ybiv2j2uqek7@airbuntu/
>
> Signed-off-by: Qais Yousef <qyousef@...alina.io>
> ---
>  drivers/cpufreq/cpufreq.c | 27 ++++-----------------------
>  include/linux/cpufreq.h   |  6 ------
>  2 files changed, 4 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 04fc786dd2c0..f98c9438760c 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -575,30 +575,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy)
>                 return policy->transition_delay_us;
>
>         latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC;
> -       if (latency) {
> -               unsigned int max_delay_us = 2 * MSEC_PER_SEC;
> +       if (latency)
> +               /* Give a 50% breathing room between updates */
> +               return latency + (latency >> 1);
>
> -               /*
> -                * If the platform already has high transition_latency, use it
> -                * as-is.
> -                */
> -               if (latency > max_delay_us)
> -                       return latency;
> -
> -               /*
> -                * For platforms that can change the frequency very fast (< 2
> -                * us), the above formula gives a decent transition delay. But
> -                * for platforms where transition_latency is in milliseconds, it
> -                * ends up giving unrealistic values.
> -                *
> -                * Cap the default transition delay to 2 ms, which seems to be
> -                * a reasonable amount of time after which we should reevaluate
> -                * the frequency.
> -                */
> -               return min(latency * LATENCY_MULTIPLIER, max_delay_us);
> -       }
> -
> -       return LATENCY_MULTIPLIER;
> +       return USEC_PER_MSEC;
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_policy_transition_delay_us);
>
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index d4d2f4d1d7cb..e0e19d9c1323 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -577,12 +577,6 @@ static inline unsigned long cpufreq_scale(unsigned long old, u_int div,
>  #define CPUFREQ_POLICY_POWERSAVE       (1)
>  #define CPUFREQ_POLICY_PERFORMANCE     (2)
>
> -/*
> - * The polling frequency depends on the capability of the processor. Default
> - * polling frequency is 1000 times the transition latency of the processor.
> - */
> -#define LATENCY_MULTIPLIER             (1000)
> -
>  struct cpufreq_governor {
>         char    name[CPUFREQ_NAME_LEN];
>         int     (*init)(struct cpufreq_policy *policy);
> --

Applied as 6.12 material, thanks!