[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240726181424.000039a4@Huawei.com>
Date: Fri, 26 Jul 2024 18:14:24 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>, <linuxarm@...wei.com>,
<rafael.j.wysocki@...el.com>, <guohanjun@...wei.com>, <gshan@...hat.com>,
<miguel.luis@...cle.com>, <catalin.marinas@....com>, Linux List Kernel
Mailing <linux-kernel@...r.kernel.org>, Linux regressions mailing list
<regressions@...ts.linux.dev>, Ingo Molnar <mingo@...hat.com>, "Borislav
Petkov" <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
<x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>, "Bowman, Terry"
<Terry.bowman@....com>, Shameerali Kolothum Thodi
<shameerali.kolothum.thodi@...wei.com>
Subject: Re: 6.11/regression/bisected - The commit c1385c1f0ba3 caused a new
possible recursive locking detected warning at computer boot.
On Fri, 26 Jul 2024 18:26:01 +0200
Thomas Gleixner <tglx@...utronix.de> wrote:
> On Thu, Jul 25 2024 at 18:13, Jonathan Cameron wrote:
> > On Tue, 23 Jul 2024 18:20:06 +0100
> > Jonathan Cameron <Jonathan.Cameron@...wei.com> wrote:
> >
> >> > This is an interesting corner and perhaps reflects a flawed
> >> > assumption we were making that for this path anything that can happen for an
> >> > initially present CPU can also happen for a hotplugged one. On the hotplugged
> >> > path the lock was always held and hence the static_key_enable() would
> >> > have failed.
>
> No. The original code invoked this without cpus read locked via:
>
> acpi_processor_driver.probe()
> __acpi_processor_start()
> ....
>
> and the cpu hotplug callback finds it already set up, so it won't reach
> the static_key_enable() anymore.
>
> > One bit I need to check out tomorrow is to make sure this doesn't race with the
> > workfn that is used to tear down the same static key on error.
>
> There is a simpler solution for that. See the uncompiled below.
Thanks. FWIW I got pretty much the same suggestion from Shameer this
morning when he saw the workfn solution on list. Classic case of me
missing the simple solution because I was down in the weeds.
I'm absolutely fine with this fix.
Mikhail, please could you test Thomas' proposal so we are absolutely sure
nothing else is hiding.
Tglx's solution is much less likely to cause problems than what I proposed because
it avoids changing the ordering.
Jonathan
>
> Thanks,
>
> tglx
> ---
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> index b3fa61d45352..0b69bfbf345d 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -306,7 +306,7 @@ static void freq_invariance_enable(void)
> WARN_ON_ONCE(1);
> return;
> }
> - static_branch_enable(&arch_scale_freq_key);
> + static_branch_enable_cpuslocked(&arch_scale_freq_key);
> register_freq_invariance_syscore_ops();
> pr_info("Estimated ratio of average max frequency by base frequency (times 1024): %llu\n", arch_max_freq_ratio);
> }
> @@ -323,8 +323,10 @@ static void __init bp_init_freq_invariance(void)
> if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> return;
>
> - if (intel_set_max_freq_ratio())
> + if (intel_set_max_freq_ratio()) {
> + guard(cpus_read_lock)();
> freq_invariance_enable();
> + }
> }
>
> static void disable_freq_invariance_workfn(struct work_struct *work)
>
>
Powered by blists - more mailing lists