linux-kernel - Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dd2e0cdd-ca95-4c83-9397-0606f3899799@arm.com>
Date: Mon, 1 Sep 2025 18:33:04 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: lukasz.luba@....com, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
 kenneth.crudup@...il.com, stable@...r.kernel.org
Subject: Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology

On 9/1/25 17:58, Rafael J. Wysocki wrote:
> On Sun, Aug 31, 2025 at 11:44 PM Christian Loehle
> <christian.loehle@....com> wrote:
>>
>> commit e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity
>> adjustment") added a mechanism to handle CPUs that come up late by
>> retrying when any of the `cpufreq_cpu_get()` call fails.
>>
>> However, if there are holes in the CPU topology (offline CPUs, e.g.
>> nosmt), the first missing CPU causes the loop to break, preventing
>> subsequent online CPUs from being updated.
>> Instead of aborting on the first missing CPU policy, loop through all
>> and retry if any were missing.
>>
>> Fixes: e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity adjustment")
>> Suggested-by: Kenneth Crudup <kenneth.crudup@...il.com>
>> Reported-by: Kenneth Crudup <kenneth.crudup@...il.com>
>> Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Christian Loehle <christian.loehle@....com>
>> ---
>>  kernel/power/energy_model.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index ea7995a25780..b63c2afc1379 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -778,7 +778,7 @@ void em_adjust_cpu_capacity(unsigned int cpu)
>>  static void em_check_capacity_update(void)
>>  {
>>         cpumask_var_t cpu_done_mask;
>> -       int cpu;
>> +       int cpu, failed_cpus = 0;
>>
>>         if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
>>                 pr_warn("no free memory\n");
>> @@ -796,10 +796,8 @@ static void em_check_capacity_update(void)
>>
>>                 policy = cpufreq_cpu_get(cpu);
>>                 if (!policy) {
>> -                       pr_debug("Accessing cpu%d policy failed\n", cpu);
> 
> I'm still quite unsure why you want to stop printing this message.  It
> is kind of useful to know which policies have had to be retried, while
> printing the number of them really isn't particularly useful.  And
> this is pr_debug(), so user selectable anyway.
> 
> So I'm inclined to retain the line above and drop the new pr_debug() below.
> 
> Please let me know if this is a problem.
For nosmt this leads to a lot of prints every seconds, that's all.
I can resend with the pr_debug for every fail, alternatively print a
cpumask.