lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0idnFDYviDBusv8hvFD+yH71kL=Q_ARpn5cUBbAg838RQ@mail.gmail.com>
Date: Mon, 1 Sep 2025 18:58:31 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: rafael@...nel.org, lukasz.luba@....com, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, dietmar.eggemann@....com, 
	kenneth.crudup@...il.com, stable@...r.kernel.org
Subject: Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology

On Sun, Aug 31, 2025 at 11:44 PM Christian Loehle
<christian.loehle@....com> wrote:
>
> commit e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity
> adjustment") added a mechanism to handle CPUs that come up late by
> retrying when any of the `cpufreq_cpu_get()` call fails.
>
> However, if there are holes in the CPU topology (offline CPUs, e.g.
> nosmt), the first missing CPU causes the loop to break, preventing
> subsequent online CPUs from being updated.
> Instead of aborting on the first missing CPU policy, loop through all
> and retry if any were missing.
>
> Fixes: e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity adjustment")
> Suggested-by: Kenneth Crudup <kenneth.crudup@...il.com>
> Reported-by: Kenneth Crudup <kenneth.crudup@...il.com>
> Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
> Cc: stable@...r.kernel.org
> Signed-off-by: Christian Loehle <christian.loehle@....com>
> ---
>  kernel/power/energy_model.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index ea7995a25780..b63c2afc1379 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -778,7 +778,7 @@ void em_adjust_cpu_capacity(unsigned int cpu)
>  static void em_check_capacity_update(void)
>  {
>         cpumask_var_t cpu_done_mask;
> -       int cpu;
> +       int cpu, failed_cpus = 0;
>
>         if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
>                 pr_warn("no free memory\n");
> @@ -796,10 +796,8 @@ static void em_check_capacity_update(void)
>
>                 policy = cpufreq_cpu_get(cpu);
>                 if (!policy) {
> -                       pr_debug("Accessing cpu%d policy failed\n", cpu);

I'm still quite unsure why you want to stop printing this message.  It
is kind of useful to know which policies have had to be retried, while
printing the number of them really isn't particularly useful.  And
this is pr_debug(), so user selectable anyway.

So I'm inclined to retain the line above and drop the new pr_debug() below.

Please let me know if this is a problem.

> -                       schedule_delayed_work(&em_update_work,
> -                                             msecs_to_jiffies(1000));
> -                       break;
> +                       failed_cpus++;
> +                       continue;
>                 }
>                 cpufreq_cpu_put(policy);
>
> @@ -814,6 +812,11 @@ static void em_check_capacity_update(void)
>                 em_adjust_new_capacity(cpu, dev, pd);
>         }
>
> +       if (failed_cpus) {
> +               pr_debug("Accessing %d policies failed, retrying\n", failed_cpus);
> +               schedule_delayed_work(&em_update_work, msecs_to_jiffies(1000));
> +       }
> +
>         free_cpumask_var(cpu_done_mask);
>  }
>
> --

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ