[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251223054806.1611168-1-jon@nutanix.com>
Date: Mon, 22 Dec 2025 22:47:53 -0700
From: Jon Kohler <jon@...anix.com>
To: seanjc@...gle.com, pbonzini@...hat.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Kiryl Shutsemau <kas@...nel.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
linux-coco@...ts.linux.dev (open list:X86 TRUST DOMAIN EXTENSIONS (TDX):Keyword:\b(tdx))
Cc: ken@...elabs.ch, Alexander.Grest@...rosoft.com, chao.gao@...el.com,
madvenka@...ux.microsoft.com, mic@...ikod.net, nsaenz@...zon.es,
tao1.su@...ux.intel.com, xiaoyao.li@...el.com, zhao1.liu@...el.com,
Jon Kohler <jon@...anix.com>
Subject: [PATCH 0/8] KVM: VMX: Introduce Intel Mode-Based Execute Control (MBEC)
## Summary
This series introduces support for Intel Mode-Based Execute Control
(MBEC) to KVM and nested VMX virtualization. By exposing MBEC to L2
guests, it enables a dramatic reduction in VMexits (up to 24x) for
Windows guests running with Hypervisor-Protected Code Integrity (HVCI),
significantly improving virtualization performance.
## What?
Intel MBEC is a hardware feature, introduced in the Kabylake
generation, that allows for more granular control over execution
permissions. MBEC enables the separation and tracking of execution
permissions for supervisor (kernel) and user-mode code. It is used as
an accelerator for Microsoft's Memory Integrity [1] (also known as
hypervisor-protected code integrity or HVCI).
## Why?
The primary reason for this feature is performance.
Without hardware-level MBEC, enabling Windows HVCI runs a 'software
MBEC' known as Restricted User Mode, which imposes a runtime overhead
due to increased state transitions between the guest's L2 root
partition and the L2 secure partition for running kernel mode code
integrity operations.
In practice, this results in a significant number of exits. For
example, playing a YouTube video within the Edge Browser produces
roughly 1.2 million VMexits/second across an 8 vCPU Windows 11 guest.
Most of these exits are VMREAD/VMWRITE operations, which can be
emulated with Enlightened VMCS (eVMCS). However, even with eVMCS, this
configuration still produces around 200,000 VMexits/second.
With MBEC exposed to the L1 Windows Hypervisor, the same scenario
results in approximately 50,000 VMexits/second, a *24x* reduction from
the baseline.
Not a typo, 24x reduction in VMexits.
## How?
This series implements core KVM support for exposing the MBEC bit in
secondary execution controls (bit 22) to L2 nested guests, based on
configuration from user space. The inspiration for this series started
with Mickaël's series for Heki [3], where we've extracted, refactored,
and completely reworked the MBEC-specific use case to be general-purpose.
MBEC splits the EPT execute permission into two independent bits. When
secondary execution control bit 22 ("mode-based execute control for EPT")
is set for the L2 guest, EPT PTE bit 2 controls execute permission for
supervisor-mode linear addresses, while bit 10 controls execute permission
for user-mode linear addresses.
The semantics for EPT violation qualifications also change when MBEC
is enabled, with bit 5 reflecting supervisor/kernel mode execute
permissions and bit 6 reflecting user mode execute permissions.
This ultimately serves to expose this feature to the L1 hypervisor,
which consumes MBEC and informs the L2 partitions not to use the
software MBEC by removing bit 13 in 0x40000004 EAX [4].
## Where?
The implementation spans multiple components:
- KVM MMU code: Teach the shadow MMU about MBEC execution modes
- KVM VMX code: Handle EPT violations and VMX controls for MBEC
- User space VMM: Pass secondary execution control bit 22 to enable MBEC
for L2 guests
A trivial enablement patch for QEMU enablement is available [5].
A GitHub mirror of this series is also available [6].
## Performance Impact
Testing shows dramatic performance improvements for Windows HVCI workloads:
- 24x reduction in VMexits for typical browser usage
- From ~1.2M VMexits/second to ~50K VMexits/second
- Enables hardware acceleration of Windows Memory Integrity
The implementation adds minimal overhead when MBEC is not used, especially
when combined with EVMCS to elide nested VMREAD/VMWRITE vmexits.
## Testing
Initial testing has been on done on 6.18-based code with:
Guests
- Windows 11 24H2 26100.2894
- Windows Server 2025 24H2 26100.2894
- Windows Server 2022 W1H2 20348.825
Processors:
- Intel Skylake 6154
- Intel Sapphire Rapids 6444Y
Unit Tests
- KVM Unit Tests [7]
## Changelog
RFC -> V1:
- Fix incorrect bit reference in cover letter (Adrian-Ken)
- Remove module parameters (Sean, Amit)
- Remove redundant arch-level tracking boolean (Sean)
- Update is_present_gpte to account for MBEC bit 10 (Chao)
- Move MBEC enablement tracking to MMU role (Sean)
- Restrict MBEC advertisement to nested virtualization only (Sean)
- Consolidate preparatory patches into main implementation (Sean)
- Add permission mask refactoring preparation (Sean)
- Implement TDP-aware executable permission checking (Sean)
[1] https://learn.microsoft.com/en-us/windows/security/hardware-security/enable-virtualization-based-protection-of-code-integrity
[2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/nested-virtualization#enlightened-vmcs-intel
[3] https://patchwork.kernel.org/project/kvm/patch/20231113022326.24388-6-mic@digikod.net/
[4] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#implementation-recommendations---0x40000004
[5] https://github.com/JonKohler/qemu/tree/mbec-v1
[6] https://github.com/JonKohler/linux/tree/mbec-v1-6.18
[7] https://github.com/JonKohler/kvm-unit-tests/tree/mbec-v1
Cc: "Adrian-Ken Rueegsegger" <ken@...elabs.ch>
Cc: "Alexander Grest" <Alexander.Grest@...rosoft.com>
Cc: "Chao Gao" <chao.gao@...el.com>
Cc: "Madhavan T . Venkataraman" <madvenka@...ux.microsoft.com>
Cc: "Mickaël Salaün" <mic@...ikod.net>
Cc: "Nicolas Saenz Julienne" <nsaenz@...zon.es>
Cc: "Tao Su" <tao1.su@...ux.intel.com>
Cc: "Xiaoyao Li" <xiaoyao.li@...el.com>
Cc: "Zhao Liu" <zhao1.liu@...el.com>
Jon Kohler (8):
KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
KVM: x86/mmu: remove SPTE_PERM_MASK
KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask
KVM: x86/mmu: update access permissions from ACC_ALL to ACC_RWX
KVM: x86/mmu: bootstrap support for Intel MBEC
KVM: VMX: enhance EPT violation handler for MBEC
KVM: VMX: allow MBEC with EVMCS
KVM: nVMX: advertise MBEC and setup mmu has_mbec
Documentation/virt/kvm/x86/mmu.rst | 9 +++-
arch/x86/include/asm/kvm_host.h | 19 +++++---
arch/x86/include/asm/vmx.h | 9 +++-
arch/x86/kvm/mmu.h | 15 +++++-
arch/x86/kvm/mmu/mmu.c | 74 ++++++++++++++++++++++++++--
arch/x86/kvm/mmu/mmutrace.h | 23 ++++++---
arch/x86/kvm/mmu/paging_tmpl.h | 24 ++++++---
arch/x86/kvm/mmu/spte.c | 65 +++++++++++++++++++------
arch/x86/kvm/mmu/spte.h | 78 ++++++++++++++++++++++++------
arch/x86/kvm/mmu/tdp_mmu.c | 12 +++--
arch/x86/kvm/vmx/capabilities.h | 6 +++
arch/x86/kvm/vmx/common.h | 15 ++++--
arch/x86/kvm/vmx/hyperv_evmcs.h | 1 +
arch/x86/kvm/vmx/nested.c | 6 +++
arch/x86/kvm/vmx/tdx.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 10 +++-
arch/x86/kvm/vmx/vmx.h | 1 +
17 files changed, 301 insertions(+), 68 deletions(-)
--
2.43.0
Powered by blists - more mailing lists