lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 6 Jul 2020 22:09:32 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     "David P. Reed" <dpreed@...pplum.com>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Allison Randal <allison@...utok.net>,
        Enrico Weigelt <info@...ux.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Martin Molnar <martin.molnar.programming@...il.com>,
        Andy Lutomirski <luto@...nel.org>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Jann Horn <jannh@...gle.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu
 on crash or panic

On Sat, Jul 04, 2020 at 04:38:08PM -0400, David P. Reed wrote:
> Fix: Mask undefined operation fault during emergency VMXOFF that must be
> attempted to force cpu exit from VMX root operation.
> Explanation: When a cpu may be in VMX root operation (only possible when
> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation
> using VMXOFF. This is necessary, because any INIT will be masked while cpu
> is in VMX root operation, but that state cannot be reliably
> discerned by the state of the cpu.
> VMXOFF faults if the cpu is not actually in VMX root operation, signalling
> undefined operation.
> Discovered while debugging an out-of-tree x-visor with a race. Can happen
> due to certain kinds of bugs in KVM.
> 
> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
> Reported-by: David P. Reed <dpreed@...pplum.com>
> Suggested-by: Thomas Gleixner <tglx@...utronix.de>
> Suggested-by: Sean Christopherson <sean.j.christopherson@...el.com>
> Suggested-by: Andy Lutomirski <luto@...nel.org>
> Signed-off-by: David P. Reed <dpreed@...pplum.com>
> ---
>  arch/x86/include/asm/virtext.h | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
> index 0ede8d04535a..0e0900eacb9c 100644
> --- a/arch/x86/include/asm/virtext.h
> +++ b/arch/x86/include/asm/virtext.h
> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void)
>  }
>  
>  
> -/* Disable VMX on the current CPU
> +/* Exit VMX root mode and isable VMX on the current CPU.
>   *
>   * vmxoff causes a undefined-opcode exception if vmxon was not run
> - * on the CPU previously. Only call this function if you know VMX
> - * is enabled.
> + * on the CPU previously. Only call this function if you know cpu
> + * is in VMX root mode.
>   */
>  static inline void cpu_vmxoff(void)
>  {
> @@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void)
>  	return __read_cr4() & X86_CR4_VMXE;
>  }
>  
> -/* Disable VMX if it is enabled on the current CPU
> +/* Safely exit VMX root mode and disable VMX if VMX enabled
> + * on the current CPU. Handle undefined-opcode fault
> + * that can occur if cpu is not in VMX root mode, due
> + * to a race.
>   *
>   * You shouldn't call this if cpu_has_vmx() returns 0.
>   */
>  static inline void __cpu_emergency_vmxoff(void)
>  {
> -	if (cpu_vmx_enabled())
> -		cpu_vmxoff();
> +	if (!cpu_vmx_enabled())
> +		return;
> +	asm volatile ("1:vmxoff\n\t"
> +		      "2:\n\t"
> +		      _ASM_EXTABLE(1b, 2b)
> +		      ::: "cc", "memory");
> +	cr4_clear_bits(X86_CR4_VMXE);

Open coding vmxoff doesn't make sense, and IMO is flat out wrong as it fixes
flows that use __cpu_emergency_vmxoff() but leaves the same bug hanging
around in emergency_vmx_disable_all() until the next patch.

The reason I say it doesn't make sense is that there is no sane scenario
where the generic vmxoff helper should _not_ eat the fault.  All other VMXOFF
faults are mode related, i.e. any fault is guaranteed to be due to the
!post-VMXON check unless we're magically in RM, VM86, compat mode, or at
CPL>0.  Given that the whole point of this series is that it's impossible to
determine whether or not the CPU if post-VMXON if CR4.VMXE=1 without taking a
fault of some form, there's simply no way that anything except the hypervisor
(in normal operation) can know the state of VMX.  And given that the only
in-tree hypervisor (KVM) has its own version of vmxoff, that means there is
no scenario in which cpu_vmxoff() can safely be used.  Case in point, after
the next patch there are no users of cpu_vmxoff().

TL;DR: Just do fixup on cpu_vmxoff().

>  }
>  
>  /* Disable VMX if it is supported and enabled on the current CPU
> -- 
> 2.26.2
> 

Powered by blists - more mailing lists