linux-kernel - [patch v3 0/7] x86/smp: Cure stop_other

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230615190036.898273129@linutronix.de>
Date:   Thu, 15 Jun 2023 22:33:49 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     x86@...nel.org, Mario Limonciello <mario.limonciello@....com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Tony Battersby <tonyb@...ernetics.com>,
        Ashok Raj <ashok.raj@...ux.intel.com>,
        Tony Luck <tony.luck@...el.com>,
        Arjan van de Veen <arjan@...ux.intel.com>,
        Eric Biederman <ebiederm@...ssion.com>
Subject: [patch v3 0/7] x86/smp: Cure stop_other_cpus() and kexec() troubles

This is the third version of the stop_other_cpus() / kexec()
vs. mwait_play_dead() series. Version 2 can be found here:

  https://lore.kernel.org/r/20230613115353.599087484@linutronix.de

The two issues addressed are:

  1) stop_other_cpus() continues after observing num_online_cpus() == 1.

     This is problematic because the to be stopped CPUs clear their online
     bit first and then invoke eventually WBINVD, which can take a long
     time. There seems to be an interaction between the WBINVD and the
     reboot mechanics as this intermittendly results in hangs.

  2) kexec() kernel can overwrite the memory locations which "offline" CPUs
     are monitoring. This write brings them out of MWAIT and they resume
     execution on overwritten text, page tables, data and stacks resulting
     in triple faults.

Cure them by:

  #1 Synchronizing stop_other_cpus() with a CPU mask which is updated in
     stop_this_cpu() _after_ WBINVD completes.

  #2 Bringing offline CPUs out of MWAIT and move them into HLT before
     starting the kexec() kernel. Optionaly send them an INIT IPI so they
     go back into wait for startup state.

Changes vs. V2:

  - Use a CPU mask instead of an atomic counter and send the NMI only to
    CPUs which did not report that they reached HLT. That's still not race
    free vs. a late handling of the reboot vector, but that's not fixable.

Interestingly enough testing the NMI mechanics unearthed that after soft
disabling the local APIC the CPU is _not_ handling the NMI despite the
SDM claiming:

  "The operation and response of a local APIC while in this software-disabled
   state is as follows:

   * The local APIC will respond normally to INIT, NMI, SMI, and SIPI messages."

I validated that even without handling the NMI, the CPU is kicked out of
HLT reliably.

It's unclear whether that's X2APIC specific and I neither verified that
behaviour on AMD. Nor is it clear what "respond normally" actually means.

The AMD APM is not helpful either:

  "SMI, NMI, INIT, Startup, and Remote Read interrupts may be accepted"

Oh well.

The series is also available from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/kexec

Thanks,

	tglx
---
 include/asm/cpu.h |    2 
 include/asm/smp.h |    4 +
 kernel/process.c  |   25 +++++++--
 kernel/smp.c      |  111 +++++++++++++++++++++++++++++-----------
 kernel/smpboot.c  |  149 ++++++++++++++++++++++++++++++++++++++++--------------
 5 files changed, 220 insertions(+), 71 deletions(-)