linux-kernel - Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200611170031.GI29918@linux.intel.com>
Date:   Thu, 11 Jun 2020 10:00:31 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     "David P. Reed" <dpreed@...pplum.com>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Allison Randal <allison@...utok.net>,
        Enrico Weigelt <info@...ux.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Martin Molnar <martin.molnar.programming@...il.com>,
        Andy Lutomirski <luto@...nel.org>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Jann Horn <jannh@...gle.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

On Thu, Jun 11, 2020 at 12:33:20PM -0400, David P. Reed wrote:
> To respond to Thomas Gleixner's suggestion about exception masking mechanism
> - it may well be a better fix, but a) I used "BUG" as a model, and b) the
> exception masking is undocumented anywhere I can find. These are "static
> inline" routines, and only the "emergency" version needs protection, because
> you'd want a random VMXOFF to actually trap.

The only in-kernel usage of cpu_vmxoff() are for emergencies.  And, the only
reasonable source of faults on VMXOFF is that VMX is already off, i.e. for
the kernel's usage, the goal is purely to ensure VMX is disabled, how we get
there doesn't truly matter.
 
> In at least one of the calls to emergency, it is stated that no locks may be
> taken at all because of where it was.
>  
> Further, I have a different patch that requires a scratch page per processor
> to exist, but which never takes a UD fault. (basically, it attempts VMXON
> first, and then does VMXOFF after VMXON, which ensures exit from VMX root
> mode, but VMXON needs a blank page to either succeed or fail without GP
> fault). If someone prefers that, it's local to the routine, but requires a
> new scratch page per processor be allocated. So after testing it, I decided
> in the interest of memory reduction that the masking of UD was preferable.

Please no, doing VMXON, even temporarily, could cause breakage.  The CPU's
VMCS cache isn't cleared on VMXOFF.  Doing VMXON after kdump_nmi_callback()
invokes cpu_crash_vmclear_loaded_vmcss() would create a window where VMPTRLD
could succeed in a hypervisor and lead to memory corruption in the new
kernel when the VMCS is evicted from the non-coherent VMCS cache.

> I'm happy to resubmit the masking exception patch as version 2, if it works
> in my test case.
>  
> Advice?

Please test the below, which simply eats any exception on VMXOFF. 

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..54bc84d7028d 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -32,13 +32,15 @@ static inline int cpu_has_vmx(void)

 /** Disable VMX on the current CPU
  *
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * VMXOFF causes a #UD if the CPU is not post-VMXON, eat any #UDs to handle
+ * races with a hypervisor doing VMXOFF, e.g. if an NMI arrived between VMXOFF
+ * and clearing CR4.VMXE.
  */
 static inline void cpu_vmxoff(void)
 {
-       asm volatile ("vmxoff");
+       asm volatile("1: vmxoff\n\t"
+                    "2:\n\t"
+                    _ASM_EXTABLE(1b, 2b));
        cr4_clear_bits(X86_CR4_VMXE);
 }