[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250626093018.106265-1-dietmar.eggemann@arm.com>
Date: Thu, 26 Jun 2025 11:30:18 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: "Rafael J . Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Sudeep Holla <sudeep.holla@....com>,
Christian Loehle <christian.loehle@....com>
Cc: linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org,
Robin Murphy <robin.murphy@....com>,
Beata Michalska <beata.michalska@....com>,
zhenglifeng1@...wei.com
Subject: [RFC PATCH] cpufreq,base/arch_topology: Calculate cpu_capacity according to boost
I noticed on my Arm64 big.Little platform (Juno-r0, scmi-cpufreq) that
the cpu_scale values (/sys/devices/system/cpu/cpu*/cpu_capacity) of the
little CPU changed in v6.14 from 446 to 505. I bisected and found that
commit dd016f379ebc ("cpufreq: Introduce a more generic way to set
default per-policy boost flag") (1) introduced this change.
Juno's scmi FW marks the 2 topmost OPPs of each CPUfreq policy (policy0:
775000 850000, policy1: 950000 1100000) as boost OPPs.
The reason is that the 'policy->boost_enabled = true' is now done after
'cpufreq_table_validate_and_sort() -> cpufreq_frequency_table_cpuinfo()'
in cpufreq_online() so that 'policy->cpuinfo.max_freq' is set to the
'highest non-boost' instead of the 'highest boost' frequency.
This is before the CPUFREQ_CREATE_POLICY notifier is fired in
cpufreq_online() to which the cpu_capacity setup code in
[drivers/base/arch_topology.c] has registered.
Its notifier_call init_cpu_capacity_callback() uses
'policy->cpuinfo.max_freq' to set the per-cpu
capacity_freq_ref so that the cpu_capacity can be calculated as:
cpu_capacity = raw_cpu_capacity (2) * capacity_freq_ref /
'max system-wide cpu frequency'
(2) Juno's little CPU has 'capacity-dmips-mhz = <578>'.
So before (1) for a little CPU:
cpu_capacity = 578 * 850000 / 1100000 = 446
and after:
cpu_capacity = 578 * 700000 / 800000 = 505
This issue can also be seen on Arm64 boards with cpufreq-dt drivers
using the 'turbo-mode' dt property for boosted OPPs.
What's actually needed IMHO is to calculate cpu_capacity according to
the boost value. I.e.:
(a) The infrastructure to adjust cpu_capacity in arch_topology.c has to
be kept alive after boot.
(b) There has to be some kind of notification from cpufreq.c to
arch_topology.c about the toggling of boost. I'm abusing
CPUFREQ_CREATE_POLICY for this right now. Could we perhaps add a
CPUFREQ_MOD_POLICY for this?
(c) Allow unconditional set of policy->cpuinfo.max_freq in case boost
is set to 0 in cpufreq_frequency_table_cpuinfo().
This currently clashes with the commented feature that in case the
driver has set a higher value it should stay untouched.
Tested on Arm64 Juno (scmi-cpufreq) and Hikey 960 (cpufreq-dt +
added 'turbo-mode' to the topmost OPPs in dts file).
This is probably related what Christian Loehle tried to address in
https://lkml.kernel.org/r/3cc5b83b-f81c-4bd7-b7ff-4d02db4e25d8@arm.com .
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@....com>
---
drivers/base/arch_topology.c | 11 -----------
drivers/cpufreq/cpufreq.c | 3 +++
drivers/cpufreq/freq_table.c | 8 +-------
3 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1c3221ff1d1f..0a3916dc9644 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -378,8 +378,6 @@ void acpi_processor_init_invariance_cppc(void)
#ifdef CONFIG_CPU_FREQ
static cpumask_var_t cpus_to_visit;
-static void parsing_done_workfn(struct work_struct *work);
-static DECLARE_WORK(parsing_done_work, parsing_done_workfn);
static int
init_cpu_capacity_callback(struct notifier_block *nb,
@@ -408,10 +406,8 @@ init_cpu_capacity_callback(struct notifier_block *nb,
if (raw_capacity) {
topology_normalize_cpu_scale();
schedule_work(&update_topology_flags_work);
- free_raw_capacity();
}
pr_debug("cpu_capacity: parsing done\n");
- schedule_work(&parsing_done_work);
}
return 0;
@@ -447,13 +443,6 @@ static int __init register_cpufreq_notifier(void)
}
core_initcall(register_cpufreq_notifier);
-static void parsing_done_workfn(struct work_struct *work)
-{
- cpufreq_unregister_notifier(&init_cpu_capacity_notifier,
- CPUFREQ_POLICY_NOTIFIER);
- free_cpumask_var(cpus_to_visit);
-}
-
#else
core_initcall(free_raw_capacity);
#endif
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d7426e1d8bdd..6fdfcb6815d7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2798,6 +2798,9 @@ int cpufreq_boost_set_sw(struct cpufreq_policy *policy, int state)
return ret;
}
+ blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
+ CPUFREQ_CREATE_POLICY, policy);
+
ret = freq_qos_update_request(policy->max_freq_req, policy->max);
if (ret < 0)
return ret;
diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 35de513af6c9..06068a12ec53 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -51,13 +51,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
}
policy->min = policy->cpuinfo.min_freq = min_freq;
- policy->max = max_freq;
- /*
- * If the driver has set its own cpuinfo.max_freq above max_freq, leave
- * it as is.
- */
- if (policy->cpuinfo.max_freq < max_freq)
- policy->max = policy->cpuinfo.max_freq = max_freq;
+ policy->max = policy->cpuinfo.max_freq = max_freq;
if (policy->min == ~0)
return -EINVAL;
--
2.34.1
Powered by blists - more mailing lists