[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTdV6bX14SGz_JWZ@google.com>
Date: Mon, 8 Dec 2025 14:49:13 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: "Xin Li (Intel)" <xin@...or.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-doc@...r.kernel.org, pbonzini@...hat.com, corbet@....net,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, luto@...nel.org,
peterz@...radead.org, andrew.cooper3@...rix.com, chao.gao@...el.com,
hch@...radead.org, sohil.mehta@...el.com, Yosry Ahmed <yosry.ahmed@...ux.dev>
Subject: Re: [PATCH v9 21/22] KVM: nVMX: Guard SHADOW_FIELD_R[OW] macros with
VMX feature checks
+Yosry
On Sun, Oct 26, 2025, Xin Li (Intel) wrote:
> From: Xin Li <xin3.li@...el.com>
>
> Add VMX feature checks to the SHADOW_FIELD_R[OW] macros to prevent access
> to VMCS fields that may be unsupported on some CPUs.
>
> Functions like copy_shadow_to_vmcs12() and copy_vmcs12_to_shadow() access
> VMCS fields that may not exist on certain hardware, such as
> INJECTED_EVENT_DATA. To avoid VMREAD/VMWRITE warnings, skip syncing fields
> tied to unsupported VMX features.
>
> Signed-off-by: Xin Li <xin3.li@...el.com>
> Signed-off-by: Xin Li (Intel) <xin@...or.com>
> Tested-by: Shan Kang <shan.kang@...el.com>
> Tested-by: Xuelian Guo <xuelian.guo@...el.com>
> ---
>
> Change in v5:
> * Add TB from Xuelian Guo.
>
> Change since v2:
> * Add __SHADOW_FIELD_R[OW] for better readability or maintability (Sean).
Coming back to this with fresh eyes, handling fields that conditionally exist
_only_ for VMCS shadowing is somewhat ridiculous. For PML and the VMX preemption
timer, the special case handling makes sense because the fields are emulated by
KVM irrespective of hardware suport. But for fields that KVM doesn't emulate in
software, e.g. GUEST_INTR_STATUS and the FRED fields, allowing accesses through
emulated VMREAD/VMWRITE and then filtering out VMCS shadowing accesses is just us
being stubborn.
I still 100% think that not restricting based on the virtual CPU model defined by
userspace is the way to go[*], because that'd require an absurd amount of effort,
complexity, and memory to solve a problem no one actually cares about. But
updating KVM's array of vmcs12 fields once during kvm-intel.ko load isn't difficult,
and would make KVM suck a little less when running on old hardware.
E.g. running the test_vmwrite_vmread KUT subtest on CPUs without TSC scaling still
fails with the wonderful:
FAIL: VMX_VMCS_ENUM.MAX_INDEX expected: 19, actual: 17
due to QEMU (sanely) setting the max index to 17 (VMX preemption timer) when the
virtual CPU model doesn't support TSC scaling.
And looking forward, we're going to have the same mess with FRED due QEMU (again,
sanely) basing its
if (f[FEAT_7_1_EAX] & CPUID_7_1_EAX_FRED) {
/* FRED injected-event data (0x2052). */
kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x52);
} else if (f[FEAT_VMX_EXIT_CTLS] &
VMX_VM_EXIT_ACTIVATE_SECONDARY_CONTROLS) {
/* Secondary VM-exit controls (0x2044). */
kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x44);
} else if (f[FEAT_VMX_SECONDARY_CTLS] & VMX_SECONDARY_EXEC_TSC_SCALING) {
/* TSC multiplier (0x2032). */
kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x32);
} else {
/* Preemption timer (0x482E). */
kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x2E);
}
KVM will still have virtualization holes, e.g. if userspace hides TSC scaling when
running on hardware+KVM that supports TSC scaling, but as above I don't think that's
a problem worth solving.
I'll post a patch (just need to test on bare metal) to sanitize vmcs12 fields,
at which point FRED nVMX support shouldn't have to do anything special beyond
noting the depending, i.e. it should only take a few lines of code.
[*] https://lore.kernel.org/all/YR2Tf9WPNEzrE7Xg@google.com
Powered by blists - more mailing lists