[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YJqsScgPlFx9j5qA@google.com>
Date: Tue, 11 May 2021 16:09:45 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Dan Williams <dan.j.williams@...el.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Tony Luck <tony.luck@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
Raj Ashok <ashok.raj@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v2 16/32] x86/tdx: Handle MWAIT, MONITOR and WBINVD
On Tue, May 11, 2021, Dave Hansen wrote:
> On 5/10/21 6:23 PM, Dan Williams wrote:
> >> To prevent TD guest from using MWAIT/MONITOR instructions,
> >> support for these instructions are already disabled by TDX
> >> module (SEAM). So CPUID flags for these instructions should
> >> be in disabled state.
> > Why does this not result in a #UD if the instruction is disabled by
> > SEAM? How is it possible to execute a disabled instruction (one
> > precluded by CPUID) to the point where it triggers #VE instead of #UD?
>
> This is actually a vestige of VMX. It's quite possible toady to have a
> feature which isn't enumerated in CPUID which still exists and "works"
> in the silicon.
No, virtualization holes are something else entirely.
MONITOR/MWAIT are a bit weird; they do have an enable bit in IA32_MISC_ENABLE,
but most VMMs don't context switch IA32_MISC_ENABLE (load guest value on entry,
load host value on exit) because that would add ~250 cycles to every host<->guest
transition. And IA32_MISC_ENABLE is shared between SMT siblings, which further
complicates loading the guest's value into hardware. In the end, it's easier to
leave MONITOR/MWAIT enabled in hardware and instead force a VM-Exit.
As for why TDX injects #VE instead of #UD, I suspect it's for the same reason
that KVM emulates MONITOR/MWAIT as nops instead of injecting a #UD. The CPUID
bit for MONITOR/MWAIT reflects their enabling in IA32_MISC_ENABLE, not raw
support in hardware. That means there's no definitive way to enumerate to BIOS
that MONITOR/MWAIT are not supported, e.g. AFAICT, EDKII blindly assumes it can
enable MONITOR/MWAIT in IA32_MISC_ENABLE. To justify #UD instead of #VE, TDX
would have to inject #GP on WRMSR to set IA32_MISC_ENABLE.ENABLE_MONITOR, and
even then there would be weirdness with respect to VMM behavior in response to
TDVMCALL(WRMSR) since the VMM could allow the virtual write. In the end, it's
again simpler to inject #VE.
> There are all kinds of pitfalls to doing this, but folks evidently do it in
> public clouds all the time.
Virtualization holes are when instructions/features are enumerated via CPUID,
but don't have a control to hide the feature from the guest (or in the case of
CET, multiple feature are buried behind a single control). So even if the VMM
hides the feature via CPUID, the guest can still _cleanly_ execute the
instruction if it's supported by the underlying hardware.
> The CPUID virtualization basically just traps into the hypervisor and
> lets the hypervisor set whatever register values it wants to appear when
> CPUID "returns".
>
> But, the controls for what instructions generate #UD are actually quite
> separate and unrelated to CPUID itself.
Eh, any sane VMM will accurately represent its virtual CPU model via CPUID
insofar as possible, there are just too many creaky corners in x86 to make things
100% bombproof.
Powered by blists - more mailing lists