[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250313203702.575156-1-jon@nutanix.com>
Date: Thu, 13 Mar 2025 13:36:39 -0700
From: Jon Kohler <jon@...anix.com>
To: seanjc@...gle.com, pbonzini@...hat.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: Jon Kohler <jon@...anix.com>,
Alexander Grest <Alexander.Grest@...rosoft.com>,
Nicolas Saenz Julienne <nsaenz@...zon.es>,
"Madhavan T . Venkataraman" <madvenka@...ux.microsoft.com>,
Mickaël Salaün <mic@...ikod.net>,
Tao Su <tao1.su@...ux.intel.com>, Xiaoyao Li <xiaoyao.li@...el.com>,
Zhao Liu <zhao1.liu@...el.com>
Subject: [RFC PATCH 00/18] KVM: VMX: Introduce Intel Mode-Based Execute Control (MBEC)
## Summary
This series introduces support for Intel Mode-Based Execute Control
(MBEC) to KVM and nested VMX virtualization, aiming to significantly
reduce VMexits and improve performance for Windows guests running with
Hypervisor-Protected Code Integrity (HVCI).
## What?
Intel MBEC is a hardware feature, introduced in the Kabylake
generation, that allows for more granular control over execution
permissions. MBEC enables the separation and tracking of execution
permissions for supervisor (kernel) and user-mode code. It is used as
an accelerator for Microsoft's Memory Integrity [1] (also known as
hypervisor-protected code integrity or HVCI).
## Why?
The primary reason for this feature is performance.
Without hardware-level MBEC, enabling Windows HVCI runs a 'software
MBEC' known as Restricted User Mode, which imposes a runtime overhead
due to increased state transitions between the guest's L2 root
partition and the L2 secure partition for running kernel mode code
integrity operations.
In practice, this results in a significant number of exits. For
example, playing a YouTube video within the Edge Browser produces
roughly 1.2 million VMexits/second across an 8 vCPU Windows 11 guest.
Most of these exits are VMREAD/VMWRITE operations, which can be
emulated with Enlightened VMCS (eVMCS). However, even with eVMCS, this
configuration still produces around 200,000 VMexits/second.
With MBEC exposed to the L1 Windows Hypervisor, the same scenario
results in approximately 50,000 VMexits/second, a *24x* reduction from
the baseline.
Not a typo, 24x reduction in VMexits.
## How?
This series implements core KVM support for exposing the MBEC bit in
secondary execution controls (bit 22) to L1 and L2, based on
configuration from user space and a module parameter
'enable_pt_guest_exec_control'. The inspiration for this series
started with Mickaël's series for Heki [3], where we've extracted,
refactored, and extended the MBEC-specific use case to be
general-purpose.
MBEC, which appears in Linux /proc/cpuinfo as ept_mode_based_exec,
splits the EPT exec bit (bit 2 in PTE) into two bits. When secondary
execution control bit 22 is set, PTE bit 2 reflects supervisor mode
executable, and PTE bit 10 reflects user mode executable.
The semantics for EPT violation qualifications also change when MBEC
is enabled, with bit 5 reflecting supervisor/kernel mode execute
permissions and bit 6 reflecting user mode execute permissions.
This ultimately serves to expose this feature to the L1 hypervisor,
which consumes MBEC and informs the L2 partitions not to use the
software MBEC by removing bit 14 in 0x40000004 EAX [4].
## Where?
Enablement spans both VMX code and MMU code to teach the shadow MMU
about the different execution modes, as well as user space VMM to pass
secondary execution control bit 22. A patch for QEMU enablement is
available [5].
## Testing
Initial testing has been on done on 6.12-based code with:
Guests
- Windows 11 24H2 26100.2894
- Windows Server 2025 24H2 26100.2894
- Windows Server 2022 W1H2 20348.825
Processors:
- Intel Skylake 6154
- Intel Sapphire Rapids 6444Y
## Acknowledgements
Special thanks to all contributors and reviewers who have provided
valuable feedback and support for this patch series.
[1] https://learn.microsoft.com/en-us/windows/security/hardware-security/enable-virtualization-based-protection-of-code-integrity
[2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/nested-virtualization#enlightened-vmcs-intel
[3] https://patchwork.kernel.org/project/kvm/patch/20231113022326.24388-6-mic@digikod.net/
[4] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#implementation-recommendations---0x40000004
[5] https://github.com/JonKohler/qemu/tree/mbec-rfc-v1
Cc: Alexander Grest <Alexander.Grest@...rosoft.com>
Cc: Nicolas Saenz Julienne <nsaenz@...zon.es>
Cc: Madhavan T. Venkataraman <madvenka@...ux.microsoft.com>
Cc: Mickaël Salaün <mic@...ikod.net>
Cc: Tao Su <tao1.su@...ux.intel.com>
Cc: Xiaoyao Li <xiaoyao.li@...el.com>
Cc: Zhao Liu <zhao1.liu@...el.com>
Jon Kohler (11):
KVM: x86: Add module parameter for Intel MBEC
KVM: x86: Add pt_guest_exec_control to kvm_vcpu_arch
KVM: VMX: Wire up Intel MBEC enable/disable logic
KVM: x86/mmu: Remove SPTE_PERM_MASK
KVM: VMX: Extend EPT Violation protection bits
KVM: x86/mmu: Introduce shadow_ux_mask
KVM: x86/mmu: Adjust SPTE_MMIO_ALLOWED_MASK to understand MBEC
KVM: x86/mmu: Extend make_spte to understand MBEC
KVM: nVMX: Setup Intel MBEC in nested secondary controls
KVM: VMX: Allow MBEC with EVMCS
KVM: x86: Enable module parameter for MBEC
Mickaël Salaün (5):
KVM: VMX: add cpu_has_vmx_mbec helper
KVM: VMX: Define VMX_EPT_USER_EXECUTABLE_MASK
KVM: x86/mmu: Extend access bitfield in kvm_mmu_page_role
KVM: VMX: Enhance EPT violation handler for PROT_USER_EXEC
KVM: x86/mmu: Extend is_executable_pte to understand MBEC
Nikolay Borisov (1):
KVM: VMX: Remove EPT_VIOLATIONS_ACC_*_BIT defines
Sean Christopherson (1):
KVM: nVMX: Decouple EPT RWX bits from EPT Violation protection bits
arch/x86/include/asm/kvm_host.h | 13 +++++----
arch/x86/include/asm/vmx.h | 45 ++++++++++++++++++++---------
arch/x86/kvm/mmu.h | 3 +-
arch/x86/kvm/mmu/mmu.c | 13 +++++----
arch/x86/kvm/mmu/mmutrace.h | 23 ++++++++++-----
arch/x86/kvm/mmu/paging_tmpl.h | 19 +++++++++---
arch/x86/kvm/mmu/spte.c | 51 ++++++++++++++++++++++++++++-----
arch/x86/kvm/mmu/spte.h | 36 +++++++++++++++--------
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
arch/x86/kvm/vmx/capabilities.h | 6 ++++
arch/x86/kvm/vmx/hyperv.c | 5 +++-
arch/x86/kvm/vmx/hyperv_evmcs.h | 1 +
arch/x86/kvm/vmx/nested.c | 4 +++
arch/x86/kvm/vmx/vmx.c | 21 ++++++++++++--
arch/x86/kvm/vmx/vmx.h | 7 +++++
arch/x86/kvm/x86.c | 4 +++
16 files changed, 192 insertions(+), 61 deletions(-)
--
2.43.0
Powered by blists - more mailing lists