[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gOuLJEPm_sG=4xOpqKJ2izY2pbLc7ROq70wvXgtb_m4A@mail.gmail.com>
Date: Mon, 1 Sep 2025 21:47:57 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: lukasz.luba@....com, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
kenneth.crudup@...il.com, stable@...r.kernel.org
Subject: Re: [PATCH] PM: EM: Fix late boot with holes in CPU topology
On Mon, Sep 1, 2025 at 7:41 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Mon, Sep 1, 2025 at 7:33 PM Christian Loehle
> <christian.loehle@....com> wrote:
> >
> > On 9/1/25 17:58, Rafael J. Wysocki wrote:
> > > On Sun, Aug 31, 2025 at 11:44 PM Christian Loehle
> > > <christian.loehle@....com> wrote:
> > >>
> > >> commit e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity
> > >> adjustment") added a mechanism to handle CPUs that come up late by
> > >> retrying when any of the `cpufreq_cpu_get()` call fails.
> > >>
> > >> However, if there are holes in the CPU topology (offline CPUs, e.g.
> > >> nosmt), the first missing CPU causes the loop to break, preventing
> > >> subsequent online CPUs from being updated.
> > >> Instead of aborting on the first missing CPU policy, loop through all
> > >> and retry if any were missing.
> > >>
> > >> Fixes: e3f1164fc9ee ("PM: EM: Support late CPUs booting and capacity adjustment")
> > >> Suggested-by: Kenneth Crudup <kenneth.crudup@...il.com>
> > >> Reported-by: Kenneth Crudup <kenneth.crudup@...il.com>
> > >> Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
> > >> Cc: stable@...r.kernel.org
> > >> Signed-off-by: Christian Loehle <christian.loehle@....com>
> > >> ---
> > >> kernel/power/energy_model.c | 13 ++++++++-----
> > >> 1 file changed, 8 insertions(+), 5 deletions(-)
> > >>
> > >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> > >> index ea7995a25780..b63c2afc1379 100644
> > >> --- a/kernel/power/energy_model.c
> > >> +++ b/kernel/power/energy_model.c
> > >> @@ -778,7 +778,7 @@ void em_adjust_cpu_capacity(unsigned int cpu)
> > >> static void em_check_capacity_update(void)
> > >> {
> > >> cpumask_var_t cpu_done_mask;
> > >> - int cpu;
> > >> + int cpu, failed_cpus = 0;
> > >>
> > >> if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
> > >> pr_warn("no free memory\n");
> > >> @@ -796,10 +796,8 @@ static void em_check_capacity_update(void)
> > >>
> > >> policy = cpufreq_cpu_get(cpu);
> > >> if (!policy) {
> > >> - pr_debug("Accessing cpu%d policy failed\n", cpu);
> > >
> > > I'm still quite unsure why you want to stop printing this message. It
> > > is kind of useful to know which policies have had to be retried, while
> > > printing the number of them really isn't particularly useful. And
> > > this is pr_debug(), so user selectable anyway.
> > >
> > > So I'm inclined to retain the line above and drop the new pr_debug() below.
> > >
> > > Please let me know if this is a problem.
> >
> > For nosmt this leads to a lot of prints every seconds, that's all.
> > I can resend with the pr_debug for every fail, alternatively print a
> > cpumask.
>
> Printing a cpumask might be better, but it would add some complexity
> only needed for the printing.
>
> Maybe it's just better to not print anything at all.
I've changed the patch to that effect and tentatively applied it, so
no need to resend if you agree with this modification.
Thanks!
Powered by blists - more mailing lists