linux-kernel - Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0jbOwH7T0StbjQLVeQiYhYU2EMCT+yp8jr8r0p4AwNgkw@mail.gmail.com>
Date: Mon, 1 Sep 2025 19:41:35 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, lukasz.luba@....com, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, dietmar.eggemann@....com, 
	kenneth.crudup@...il.com, stable@...r.kernel.org
Subject: Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology

On Mon, Sep 1, 2025 at 7:33 PM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 9/1/25 17:58, Rafael J. Wysocki wrote:
> > On Sun, Aug 31, 2025 at 11:44 PM Christian Loehle
> > <christian.loehle@....com> wrote:
> >>
> >> commit e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity
> >> adjustment") added a mechanism to handle CPUs that come up late by
> >> retrying when any of the `cpufreq_cpu_get()` call fails.
> >>
> >> However, if there are holes in the CPU topology (offline CPUs, e.g.
> >> nosmt), the first missing CPU causes the loop to break, preventing
> >> subsequent online CPUs from being updated.
> >> Instead of aborting on the first missing CPU policy, loop through all
> >> and retry if any were missing.
> >>
> >> Fixes: e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity adjustment")
> >> Suggested-by: Kenneth Crudup <kenneth.crudup@...il.com>
> >> Reported-by: Kenneth Crudup <kenneth.crudup@...il.com>
> >> Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
> >> Cc: stable@...r.kernel.org
> >> Signed-off-by: Christian Loehle <christian.loehle@....com>
> >> ---
> >>  kernel/power/energy_model.c | 13 ++++++++-----
> >>  1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> >> index ea7995a25780..b63c2afc1379 100644
> >> --- a/kernel/power/energy_model.c
> >> +++ b/kernel/power/energy_model.c
> >> @@ -778,7 +778,7 @@ void em_adjust_cpu_capacity(unsigned int cpu)
> >>  static void em_check_capacity_update(void)
> >>  {
> >>         cpumask_var_t cpu_done_mask;
> >> -       int cpu;
> >> +       int cpu, failed_cpus = 0;
> >>
> >>         if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
> >>                 pr_warn("no free memory\n");
> >> @@ -796,10 +796,8 @@ static void em_check_capacity_update(void)
> >>
> >>                 policy = cpufreq_cpu_get(cpu);
> >>                 if (!policy) {
> >> -                       pr_debug("Accessing cpu%d policy failed\n", cpu);
> >
> > I'm still quite unsure why you want to stop printing this message.  It
> > is kind of useful to know which policies have had to be retried, while
> > printing the number of them really isn't particularly useful.  And
> > this is pr_debug(), so user selectable anyway.
> >
> > So I'm inclined to retain the line above and drop the new pr_debug() below.
> >
> > Please let me know if this is a problem.
>
> For nosmt this leads to a lot of prints every seconds, that's all.
> I can resend with the pr_debug for every fail, alternatively print a
> cpumask.

Printing a cpumask might be better, but it would add some complexity
only needed for the printing.

Maybe it's just better to not print anything at all.