lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e89b250a-7e9b-45fa-9e81-fc071487078b@arm.com>
Date: Mon, 14 Jul 2025 14:16:25 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: "Rafael J . Wysocki" <rafael@...nel.org>,
 Viresh Kumar <viresh.kumar@...aro.org>, Sudeep Holla <sudeep.holla@....com>,
 Christian Loehle <christian.loehle@....com>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
 Robin Murphy <robin.murphy@....com>,
 Beata Michalska <beata.michalska@....com>, zhenglifeng1@...wei.com,
 "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
 Ionela Voinescu <ionela.voinescu@....com>
Subject: Re: [RFC PATCH] cpufreq,base/arch_topology: Calculate cpu_capacity
 according to boost

+cc Vincent Guittot <vincent.guittot@...aro.org>
+cc Ionela Voinescu <ionela.voinescu@....com>

On 26/06/2025 11:30, Dietmar Eggemann wrote:
> I noticed on my Arm64 big.Little platform (Juno-r0, scmi-cpufreq) that
> the cpu_scale values (/sys/devices/system/cpu/cpu*/cpu_capacity) of the
> little CPU changed in v6.14 from 446 to 505. I bisected and found that
> commit dd016f379ebc ("cpufreq: Introduce a more generic way to set
> default per-policy boost flag") (1) introduced this change.
> Juno's scmi FW marks the 2 topmost OPPs of each CPUfreq policy (policy0:
> 775000 850000, policy1: 950000 1100000) as boost OPPs.
> 
> The reason is that the 'policy->boost_enabled = true' is now done after
> 'cpufreq_table_validate_and_sort() -> cpufreq_frequency_table_cpuinfo()'
> in cpufreq_online() so that 'policy->cpuinfo.max_freq' is set to the
> 'highest non-boost' instead of the 'highest boost' frequency.
> 
> This is before the CPUFREQ_CREATE_POLICY notifier is fired in
> cpufreq_online() to which the cpu_capacity setup code in
> [drivers/base/arch_topology.c] has registered.
> 
> Its notifier_call init_cpu_capacity_callback() uses
> 'policy->cpuinfo.max_freq' to set the per-cpu
> capacity_freq_ref so that the cpu_capacity can be calculated as:
> 
> cpu_capacity = raw_cpu_capacity (2) * capacity_freq_ref /
> 				      'max system-wide cpu frequency'
> 
> (2) Juno's little CPU has 'capacity-dmips-mhz = <578>'.
> 
> So before (1) for a little CPU:
> 
> cpu_capacity = 578 * 850000 / 1100000 = 446
> 
> and after:
> 
> cpu_capacity = 578 * 700000 / 800000 = 505
> 
> This issue can also be seen on Arm64 boards with cpufreq-dt drivers
> using the 'turbo-mode' dt property for boosted OPPs.
> 
> What's actually needed IMHO is to calculate cpu_capacity according to
> the boost value. I.e.:
> 
> (a) The infrastructure to adjust cpu_capacity in arch_topology.c has to
>     be kept alive after boot.
> 
> (b) There has to be some kind of notification from cpufreq.c to
>     arch_topology.c about the toggling of boost. I'm abusing
>     CPUFREQ_CREATE_POLICY for this right now. Could we perhaps add a
>     CPUFREQ_MOD_POLICY for this?
> 
> (c) Allow unconditional set of policy->cpuinfo.max_freq in case boost
>     is set to 0 in cpufreq_frequency_table_cpuinfo().
>     This currently clashes with the commented feature that in case the
>     driver has set a higher value it should stay untouched.
> 
> Tested on Arm64 Juno (scmi-cpufreq) and Hikey 960 (cpufreq-dt +
> added 'turbo-mode' to the topmost OPPs in dts file).
> 
> This is probably related what Christian Loehle tried to address in
> https://lkml.kernel.org/r/3cc5b83b-f81c-4bd7-b7ff-4d02db4e25d8@arm.com .

Christian L. reminded me that since commit dd016f379ebc we also have a
performance regression on a system with boosted OPPs using schedutil
CPUfreq governor.

The reason is that per cpu 'capacity_freq_ref' is set in
drivers/base/arch_topology.c only during system boot so far based on the
highest non-boosted OPP since boost is disabled per default.

Schedutil uses capacity_freq_ref (*) in get_next_freq() to calculate the
next frequency request:

   next_freq = max_freq * util / max
               ^^^^^^^^
                 (*)

In case the boost OPPs will be enabled:

   echo 1 > /sys/devices/system/cpu/cpufreq/boost

'capacity_freq_ref' stays at the highest non-boosted OPP's so schedutil
won't request any boosted OPPs for util values > ''highest non boosted
OPP'/'highest boosted OPP' * max'. The 'highest non boosted OPP' will be
used by schedutil instead.

This performance regression will go away with the proposed patch as well.

Calling drivers/base/arch_topology.c's init_cpu_capacity_callback() in
the event that boost is toggled makes sure that 'capacity_freq_ref' will
be set to the highest boosted (0->1) or highest non-boosted (1->0) OPP.

[...]






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ