[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f8a9cb8-70cf-2a17-cfc4-cb31cb658de4@cybernetics.com>
Date: Wed, 26 Apr 2023 10:45:57 -0400
From: Tony Battersby <tonyb@...ernetics.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Dave Hansen <dave.hansen@...el.com>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org
Cc: "H. Peter Anvin" <hpa@...or.com>,
Mario Limonciello <mario.limonciello@....com>,
Tom Lendacky <thomas.lendacky@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff
On 4/25/23 17:05, Thomas Gleixner wrote:
> On Tue, Apr 25 2023 at 13:03, Dave Hansen wrote:
>
>> On 4/25/23 12:26, Tony Battersby wrote:
>>> - if (cpuid_eax(0x8000001f) & BIT(0))
>>> + if (c->extended_cpuid_level >= 0x8000001f &&
>>> + (cpuid_eax(0x8000001f) & BIT(0)))
>>> native_wbinvd();
>> Oh, so the existing code is running into the
>>
>>> If a value entered for CPUID.EAX is higher than the maximum input
>>> value for basic or extended function for that processor then the data
>>> for the highest basic information leaf is returned
>> behavior. It's basically looking at BIT(0) of some random extended
>> leaf. Probably 0x80000008 based on your 'cpuid -r' output.
> Right, accessing that leaf without checking whether it exists is wrong,
> but that does not explain the hang itself.
>
> The only consequence of looking at bit 0 of some random other leaf is
> that all CPUs which run stop_this_cpu() issue WBINVD in parallel, which
> is slow but should not be a fatal issue.
>
> Tony observed this is a 50% chance to hang, which means this is a timing
> issue.
>
> Now there are two things to investigate:
>
> 1) Does the system go south if enough CPUs issue WBINVD concurrently?
>
> That should be trivial to analyze by enforcing concurreny on a
> WBINVD in an IPI via a global synchronization bit or such
>
> 2) The first thing stop_this_cpu() does is to clear its own bit in the
> CPU online mask.
>
> The CPU which controls shutdown/reboot waits for num_online_cpus()
> to drop down to one, which means it can proceed _before_ the other
> CPUs have actually reached HALT.
>
> That's not a new thing. It has been that way forever. Just the
> WBINVD might cause enough delay to create problems.
>
> That should be trivial to analyze too by just waiting on the
> control side for e.g 100ms after num_online_cpus() dropped down to
> one.
>
For test #1, I have never used IPI before, so I would have to look into
how to do that. Or you could send me a patch to test if you still want
the test done. But test #2 produced results, so maybe it is not necessary.
For test #2, I re-enabled native_wbinvd() by reverting the patch that I
sent, and then I applied the following patch:
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 375b33ecafa2..1a9b225c85b6 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -212,6 +212,7 @@ static void native_stop_other_cpus(int wait)
udelay(1);
}
+ mdelay(100);
local_irq_save(flags);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
With that I got a successful power-off 10 times in a row.
Let me know if there is anything else I can test.
I will resend my patch later with a different description.
Tony Battersby
Cybernetics
Powered by blists - more mailing lists