linux-kernel - Re: [patch v3 1/7] x86/smp: Make stop_other

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f3290a5-7fd9-1d40-5183-2fffcf10b2f3@cybernetics.com>
Date:   Fri, 16 Jun 2023 12:36:22 -0400
From:   Tony Battersby <tonyb@...ernetics.com>
To:     Ashok Raj <ashok.raj@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Mario Limonciello <mario.limonciello@....com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Ashok Raj <ashok.raj@...ux.intel.com>,
        Tony Luck <tony.luck@...el.com>,
        Arjan van de Veen <arjan@...ux.intel.com>,
        Eric Biederman <ebiederm@...ssion.com>
Subject: Re: [patch v3 1/7] x86/smp: Make stop_other_cpus() more robust

On 6/15/23 21:58, Ashok Raj wrote:
> Hi Thomas,
>
> On Thu, Jun 15, 2023 at 10:33:50PM +0200, Thomas Gleixner wrote:
>> Tony reported intermittent lockups on poweroff. His analysis identified the
>> wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
>> on SME enabled machines a kexec() does not leave any stale data in the
>> caches when switching from encrypted to non-encrypted mode or vice versa.
>>
>> That wbindv() is conditional on the SME feature bit which is read directly
>> from CPUID. But that readout does not check whether the CPUID leaf is
>> available or not. If it's not available the CPU will return the value of
>> the highest supported leaf instead. Depending on the content the "SME" bit
>> might be set or not.
>>
>> That's incorrect but harmless. Making the CPUID readout conditional makes
>> the observed hangs go away, but it does not fix the underlying problem:
>>
>> CPU0					CPU1
>>
>>  stop_other_cpus()
>>    send_IPIs(REBOOT);			stop_this_cpu()
>>    while (num_online_cpus() > 1);         set_online(false);
>>    proceed... -> hang
>> 				          wbinvd()
>>
>> WBINVD is an expensive operation and if multiple CPUs issue it at the same
>> time the resulting delays are even larger.
>>
>> But CPU0 already observed num_online_cpus() going down to 1 and proceeds
>> which causes the system to hang.
>>
>> This issue exists independent of WBINVD, but the delays caused by WBINVD
>> make it more prominent.
>>
>> Make this more robust by adding a cpumask which is initialized to the
>> online CPU mask before sending the IPIs and CPUs clear their bit in
>> stop_this_cpu() after the WBINVD completed. Check for that cpumask to
>> become empty in stop_other_cpus() instead of watching num_online_cpus().
>>
>> The cpumask cannot plug all holes either, but it's better than a raw
>> counter and allows to restrict the NMI fallback IPI to be sent only to
>> the CPUs which have not reported within the timeout window.
>>
>> Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
>> Reported-by: Tony Battersby <tonyb@...ernetics.com>
>> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>> Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
>> ---
>> V3: Use a cpumask to make the NMI case slightly safer - Ashok
>> ---
>>  arch/x86/include/asm/cpu.h |    2 +
>>  arch/x86/kernel/process.c  |   23 +++++++++++++-
>>  arch/x86/kernel/smp.c      |   71 +++++++++++++++++++++++++++++++--------------
>>  3 files changed, 73 insertions(+), 23 deletions(-)
> I tested them and seems to work fine on my system.
>
> Maybe Tony can check in his setup would be great.
>
plain 6.4-rc6: 50% failure rate
  poweroff success: 2
  poweroff fail:    2

6.4-rc6 with tglx v3 patch #1 only: 0% failure rate
  poweroff success: 10
  poweroff fail:    0

6.4-rc6 with all 7 tglx v3 patches: 0% failure rate
  poweroff success: 10
  poweroff fail:    0

Fixes my problem.

Tested-by: Tony Battersby <tonyb@...ernetics.com>