linux-kernel - Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3F5CEF02-0561-4E28-851B-8E993F76DC9B@amacapital.net>
Date:   Wed, 10 Jun 2020 14:59:19 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     "David P. Reed" <dpreed@...pplum.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        Allison Randal <allison@...utok.net>,
        Enrico Weigelt <info@...ux.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Martin Molnar <martin.molnar.programming@...il.com>,
        Andy Lutomirski <luto@...nel.org>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Jann Horn <jannh@...gle.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash



> On Jun 10, 2020, at 11:21 AM, David P. Reed <dpreed@...pplum.com> wrote:
> 
> If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
> done on all CPUS, to allow the INIT IPI to function, since
> INIT is suppressed when CPUs are in VMX root operation.
> However, VMXOFF causes an undefined operation fault if the CPU is not
> in VMX operation, that is, VMXON has not been executed, or VMXOFF
> has been executed, but VMX is enabled.

I’m surprised. Wouldn’t this mean that emergency reboots always fail it a VM is running?  I would think someone would have noticed before.

> This fix makes the reboot
> work more reliably by modifying the #UD handler to skip the VMXOFF
> if VMX is enabled on the CPU and the VMXOFF is executed as part
> of cpu_emergency_vmxoff().

NAK. See below.

> The logic in reboot.c is also corrected, since the point of forcing
> the processor out of VMX root operation is because when VMX root
> operation is enabled, the processor INIT signal is always masked.
> See Intel SDM section on differences between VMX Root operation and normal
> operation. Thus every CPU must be forced out of VMX operation.
> Since the CPU will hang rather than restart, a manual "reset" is the
> only way out of this state (or if there is a BMC, it can issue a RESET
> to the chip).
> 
> Signed-off-by: David P. Reed <dpreed@...pplum.com>
> ---
> arch/x86/include/asm/virtext.h | 24 ++++++++++++----
> arch/x86/kernel/reboot.c       | 13 ++-------
> arch/x86/kernel/traps.c        | 52 ++++++++++++++++++++++++++++++++--
> 3 files changed, 71 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
> index 9aad0e0876fb..ea2d67191684 100644
> --- a/arch/x86/include/asm/virtext.h
> +++ b/arch/x86/include/asm/virtext.h
> @@ -13,12 +13,16 @@
> #ifndef _ASM_X86_VIRTEX_H
> #define _ASM_X86_VIRTEX_H
> 
> +#include <linux/percpu.h>
> +
> #include <asm/processor.h>
> 
> #include <asm/vmx.h>
> #include <asm/svm.h>
> #include <asm/tlbflush.h>
> 
> +DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff);
> +
> /*
> * VMX functions:
> */
> @@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void)
> /** Disable VMX on the current CPU
> *
> * vmxoff causes a undefined-opcode exception if vmxon was not run
> - * on the CPU previously. Only call this function if you know VMX
> - * is enabled.
> + * on the CPU previously. Only call this function directly if you know VMX
> + * is enabled *and* CPU is in VMX root operation.
> */

So presumably the bug is someone calling this inappropriatelet?

> static inline void cpu_vmxoff(void)
> {
> @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void)
>   return __read_() & X86_CR4_VMXE;
> }
> 
> -/** Disable VMX if it is enabled on the current CPU
> +/** Force disable VMX if it is enabled on the current CPU.
> + * Note that if CPU is not in VMX root operation this
> + * VMXOFF will fault an undefined operation fault.
> + * So the 'doing_emergency_vmxoff' percpu flag is set,
> + * the trap handler for just restarts execution after
> + * the VMXOFF instruction.
> *
> - * You shouldn't call this if cpu_has_vmx() returns 0.
> + * You shouldn't call this directly if cpu_has_vmx() returns 0.
> */
> static inline void __cpu_emergency_vmxoff(void)
> {
> -    if (cpu_vmx_enabled())
> +    if (cpu_vmx_enabled()) {
> +        this_cpu_write(doing_emergency_vmxoff, 1);
>       cpu_vmxoff();
> +        this_cpu_write(doing_emergency_vmxoff, 0);
> +    }
> }

NAK. Just write this in asm with an exception handler that does the right thing.

Please also try to identify the actual bug.  Because I have a sneaking suspicion that you are running an out of tree module that has issues. If so, the patch should explain this.

> 
> -/** Disable VMX if it is supported and enabled on the current CPU
> +/** Force disable VMX if it is supported and enabled on the current CPU
> */
> static inline void cpu_emergency_vmxoff(void)
> {
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 3ca43be4f9cf..abc8b51a57c7 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
>    *
>    * For safety, we will avoid running the nmi_shootdown_cpus()
>    * stuff unnecessarily, but we don't have a way to check
> -     * if other CPUs have VMX enabled. So we will call it only if the
> -     * CPU we are running on has VMX enabled.
> -     *
> -     * We will miss cases where VMX is not enabled on all CPUs. This
> -     * shouldn't do much harm because KVM always enable VMX on all
> -     * CPUs anyway. But we can miss it on the small window where KVM
> -     * is still enabling VMX.
> +     * if other CPUs have VMX enabled.
>    */
> -    if (cpu_has_vmx() && cpu_vmx_enabled()) {
> +    if (cpu_has_vmx()) {
>       /* Disable VMX on this CPU. */
> -        cpu_vmxoff();
> +        cpu_emergency_vmxoff();
> 
>       /* Halt and disable VMX on the other CPUs */
>       nmi_shootdown_cpus(vmxoff_nmi);
> -
>   }
> }
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 4cc541051994..2dcf57ef467e 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -39,6 +39,7 @@
> #include <linux/io.h>
> #include <linux/hardirq.h>
> #include <linux/atomic.h>
> +#include <linux/percpu.h>
> 
> #include <asm/stacktrace.h>
> #include <asm/processor.h>
> @@ -59,6 +60,7 @@
> #include <asm/umip.h>
> #include <asm/insn.h>
> #include <asm/insn-eval.h>
> +#include <asm/virtext.h>
> 
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -70,6 +72,8 @@
> #include <asm/proto.h>
> #endif
> 
> +DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0;
> +
> DECLARE_BITMAP(system_vectors, NR_VECTORS);
> 
> static inline void cond_local_irq_enable(struct pt_regs *regs)
> @@ -115,6 +119,43 @@ int fixup_bug(struct pt_regs *regs, int trapnr)
>   return 0;
> }
> 
> +/*
> + * Fix any unwanted undefined operation fault due to VMXOFF instruction that
> + * is needed to ensure that CPU is not in VMX root operation at time of
> + * a reboot/panic CPU reset. There is no safe and reliable way to know
> + * if a processor is in VMX root operation, other than to skip the
> + * VMXOFF. It is safe to just skip any VMXOFF that might generate this
> + * exception, when VMX operation is enabled in CR4. In the extremely
> + * rare case that a VMXOFF is erroneously executed while VMX is enabled,
> + * but VMXON has not been executed yet, the undefined opcode fault
> + * should not be missed by valid code, though it would be an error.
> + * To detect this, we could somehow restrict the instruction address
> + * to the specific use during reboot/panic.
> + */
> +static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr)
> +{

NAK.