[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bkw3xgcl.fsf@redhat.com>
Date: Thu, 12 May 2022 11:14:34 +0200
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
"Guilherme G . Piccoli" <gpiccoli@...lia.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Michael Kelley <mikelley@...rosoft.com>,
Dexuan Cui <decui@...rosoft.com>
Subject: Re: [PATCH 1/2] x86/crash: Disable virt in core NMI crash handler
to avoid double list_add
Sean Christopherson <seanjc@...gle.com> writes:
> Disable virtualization in crash_nmi_callback() and skip the requested NMI
> shootdown if a shootdown has already occurred, i.e. a callback has been
> registered. The NMI crash shootdown path doesn't play nice with multiple
> invocations, e.g. attempting to register the NMI handler multiple times
> will trigger a double list_add() and hang the system (in addition to
> multiple other issues). If "crash_kexec_post_notifiers" is specified on
> the kernel command line, panic() will invoke crash_smp_send_stop() and
> result in a second call to nmi_shootdown_cpus() during
> native_machine_emergency_restart().
crash_kexec_post_notifiers is also automatically enabled on Hyper-V (see
hv_common_init()) which means the hang should also be observed there.
Cc: Michael and Dexuan
>
> Invoke the callback _before_ disabling virtualization, as the current
> VMCS needs to be cleared before doing VMXOFF. Note, this results in a
> subtle change in ordering between disabling virtualization and stopping
> Intel PT on the responding CPUs. While VMX and Intel PT do interact,
> VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one
> another, which is all that matters when panicking.
>
> WARN if nmi_shootdown_cpus() is called a second time with anything other
> than the reboot path's "nop" handler, as bailing means the requested
> isn't being invoked. Punt true handling of multiple shootdown callbacks
> until there's an actual use case for doing so (beyond disabling
> virtualization).
>
> Extract the disabling logic to a common helper to deduplicate code, and
> to prepare for doing the shootdown in the emergency reboot path if SVM
> is supported.
>
> Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit
> VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected
> against a second invocation by a cpu_vmx_enabled() check as the kdump
> handler would disable VMX if it ran first.
>
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
> Cc: stable@...r.kernel.org
> Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@...lia.com>
> Cc: Vitaly Kuznetsov <vkuznets@...hat.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/include/asm/reboot.h | 1 +
> arch/x86/kernel/crash.c | 16 +--------------
> arch/x86/kernel/reboot.c | 38 ++++++++++++++++++++++++++++++++---
> 3 files changed, 37 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> index 04c17be9b5fd..8f2da36435a6 100644
> --- a/arch/x86/include/asm/reboot.h
> +++ b/arch/x86/include/asm/reboot.h
> @@ -25,6 +25,7 @@ void __noreturn machine_real_restart(unsigned int type);
> #define MRR_BIOS 0
> #define MRR_APM 1
>
> +void cpu_crash_disable_virtualization(void);
> typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> void nmi_panic_self_stop(struct pt_regs *regs);
> void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e8326a8d1c5d..fe0cf83843ba 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -81,15 +81,6 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> */
> cpu_crash_vmclear_loaded_vmcss();
>
> - /* Disable VMX or SVM if needed.
> - *
> - * We need to disable virtualization on all CPUs.
> - * Having VMX or SVM enabled on any CPU may break rebooting
> - * after the kdump kernel has finished its task.
> - */
> - cpu_emergency_vmxoff();
> - cpu_emergency_svm_disable();
> -
> /*
> * Disable Intel PT to stop its logging
> */
> @@ -148,12 +139,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> */
> cpu_crash_vmclear_loaded_vmcss();
>
> - /* Booting kdump kernel with VMX or SVM enabled won't work,
> - * because (among other limitations) we can't disable paging
> - * with the virt flags.
> - */
> - cpu_emergency_vmxoff();
> - cpu_emergency_svm_disable();
> + cpu_crash_disable_virtualization();
>
> /*
> * Disable Intel PT to stop its logging
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index fa700b46588e..f9543a4e9b09 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -528,9 +528,9 @@ static inline void kb_wait(void)
> }
> }
>
> -static void vmxoff_nmi(int cpu, struct pt_regs *regs)
> +static void nmi_shootdown_nop(int cpu, struct pt_regs *regs)
> {
> - cpu_emergency_vmxoff();
> + /* Nothing to do, the NMI shootdown handler disables virtualization. */
> }
>
> /* Use NMIs as IPIs to tell all CPUs to disable virtualization */
> @@ -554,7 +554,7 @@ static void emergency_vmx_disable_all(void)
> __cpu_emergency_vmxoff();
>
> /* Halt and exit VMX root operation on the other CPUs. */
> - nmi_shootdown_cpus(vmxoff_nmi);
> + nmi_shootdown_cpus(nmi_shootdown_nop);
> }
> }
>
> @@ -802,6 +802,18 @@ static nmi_shootdown_cb shootdown_callback;
> static atomic_t waiting_for_crash_ipi;
> static int crash_ipi_issued;
>
> +void cpu_crash_disable_virtualization(void)
> +{
> + /*
> + * Disable virtualization, i.e. VMX or SVM, so that INIT is recognized
> + * during reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM
> + * blocks INIT if GIF=0. Note, CLGI #UDs if SVM isn't enabled, so it's
> + * easier to just disable SVM unconditionally.
> + */
> + cpu_emergency_vmxoff();
> + cpu_emergency_svm_disable();
> +}
> +
> static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> {
> int cpu;
> @@ -819,6 +831,12 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
>
> shootdown_callback(cpu, regs);
>
> + /*
> + * Prepare the CPU for reboot _after_ invoking the callback so that the
> + * callback can safely use virtualization instructions, e.g. VMCLEAR.
> + */
> + cpu_crash_disable_virtualization();
> +
> atomic_dec(&waiting_for_crash_ipi);
> /* Assume hlt works */
> halt();
> @@ -840,6 +858,20 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> unsigned long msecs;
> local_irq_disable();
>
> + /*
> + * Invoking multiple callbacks is not currently supported, registering
> + * the NMI handler twice will cause a list_add() double add BUG().
> + * The exception is the "nop" handler in the emergency reboot path,
> + * which can run after e.g. kdump's shootdown. Do nothing if the crash
> + * handler has already run, i.e. has already prepared other CPUs, the
> + * reboot path doesn't have any work of its to do, it just needs to
> + * ensure all CPUs have prepared for reboot.
> + */
> + if (shootdown_callback) {
> + WARN_ON_ONCE(callback != nmi_shootdown_nop);
> + return;
> + }
> +
> /* Make a note of crashing cpu. Will be used in NMI callback. */
> crashing_cpu = safe_smp_processor_id();
Assuming I didn't get lost in the (infinite) multiverse of possible
shutdown/reboot/crash/kexec-not-kexec paths,
Reviewed-by: Vitaly Kuznetsov <vkuznets@...hat.com>
--
Vitaly
Powered by blists - more mailing lists