[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd2e0cdd-ca95-4c83-9397-0606f3899799@arm.com>
Date: Mon, 1 Sep 2025 18:33:04 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: lukasz.luba@....com, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
kenneth.crudup@...il.com, stable@...r.kernel.org
Subject: Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology
On 9/1/25 17:58, Rafael J. Wysocki wrote:
> On Sun, Aug 31, 2025 at 11:44 PM Christian Loehle
> <christian.loehle@....com> wrote:
>>
>> commit e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity
>> adjustment") added a mechanism to handle CPUs that come up late by
>> retrying when any of the `cpufreq_cpu_get()` call fails.
>>
>> However, if there are holes in the CPU topology (offline CPUs, e.g.
>> nosmt), the first missing CPU causes the loop to break, preventing
>> subsequent online CPUs from being updated.
>> Instead of aborting on the first missing CPU policy, loop through all
>> and retry if any were missing.
>>
>> Fixes: e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity adjustment")
>> Suggested-by: Kenneth Crudup <kenneth.crudup@...il.com>
>> Reported-by: Kenneth Crudup <kenneth.crudup@...il.com>
>> Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Christian Loehle <christian.loehle@....com>
>> ---
>> kernel/power/energy_model.c | 13 ++++++++-----
>> 1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index ea7995a25780..b63c2afc1379 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -778,7 +778,7 @@ void em_adjust_cpu_capacity(unsigned int cpu)
>> static void em_check_capacity_update(void)
>> {
>> cpumask_var_t cpu_done_mask;
>> - int cpu;
>> + int cpu, failed_cpus = 0;
>>
>> if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
>> pr_warn("no free memory\n");
>> @@ -796,10 +796,8 @@ static void em_check_capacity_update(void)
>>
>> policy = cpufreq_cpu_get(cpu);
>> if (!policy) {
>> - pr_debug("Accessing cpu%d policy failed\n", cpu);
>
> I'm still quite unsure why you want to stop printing this message. It
> is kind of useful to know which policies have had to be retried, while
> printing the number of them really isn't particularly useful. And
> this is pr_debug(), so user selectable anyway.
>
> So I'm inclined to retain the line above and drop the new pr_debug() below.
>
> Please let me know if this is a problem.
For nosmt this leads to a lot of prints every seconds, that's all.
I can resend with the pr_debug for every fail, alternatively print a
cpumask.
Powered by blists - more mailing lists