[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <shaevlgw5h7i3oxtoj6yqun3mklwdi6nng3noypxeqevnuqlcu@urfhn55x7owk>
Date: Fri, 2 Jan 2026 18:00:32 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Kevin Cheng <chengkev@...gle.com>
Subject: Re: [PATCH] KVM: x86: Disallow setting CPUID and/or feature MSRs if
L2 is active
On Tue, Dec 30, 2025 at 12:56:41PM -0800, Sean Christopherson wrote:
> Extend KVM's restriction on CPUID and feature MSR changes to disallow
> updates while L2 is active in addition to rejecting updates after the vCPU
> has run at least once. Like post-run vCPU model updates, attempting to
> react to model changes while L2 is active is practically infeasible, e.g.
> KVM would need to do _something_ in response to impossible situations where
> userspace has a removed a feature that was consumed as parted of nested
> VM-Enter.
>
> In practice, disallowing vCPU model changes while L2 is active is largely
> uninteresting, as the only way for L2 to be active without the vCPU having
> run at least once is if userspace stuffed state via KVM_SET_NESTED_STATE.
> And because KVM_SET_NESTED_STATE can't put the vCPU into L2 without
> userspace first defining the vCPU model, e.g. to enable SVM/VMX, modifying
> the vCPU model while L2 is active would require deliberately setting the
> vCPU model, then loading nested state, and then changing the model. I.e.
> no sane VMM should run afoul of the new restriction, and any VMM that does
> encounter problems has likely been running a broken setup for a long time.
>
> Cc: Yosry Ahmed <yosry.ahmed@...ux.dev>
> Cc: Kevin Cheng <chengkev@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
I can't speak to the policy change, but the code looks correct (with a
minor nit below):
Reviewed-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> ---
> arch/x86/kvm/cpuid.c | 19 +++++++++++--------
> arch/x86/kvm/mmu/mmu.c | 6 +-----
> arch/x86/kvm/pmu.c | 2 +-
> arch/x86/kvm/x86.c | 13 +++++++------
> arch/x86/kvm/x86.h | 4 ++--
> 5 files changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 88a5426674a1..f37331ad3ad8 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -534,17 +534,20 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
>
> /*
> - * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> - * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> - * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
> - * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
> - * the core vCPU model on the fly. It would've been better to forbid any
> - * KVM_SET_CPUID{,2} calls after KVM_RUN altogether but unfortunately
> - * some VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
> + * KVM does not correctly handle changing guest CPUID after KVM_RUN or
> + * while L2 is active, as MAXPHYADDR, GBPAGES support, AMD reserved bit
> + * behavior, etc. aren't tracked in kvm_mmu_page_role, and L2 state
> + * can't be adjusted (without breaking L2 in some way). As a result,
> + * KVM may reuse SPs/SPTEs and/or run L2 with bad/misconfigured state.
> + *
> + * In practice, no sane VMM mucks with the core vCPU model on the fly.
> + * It would've been better to forbid any KVM_SET_CPUID{,2} calls after
> + * KVM_RUN or KVM_SET_NESTED_STATE altogether, but unfortunately some
> + * VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
> * KVM_SET_CPUID{,2} again. To support this legacy behavior, check
> * whether the supplied CPUID data is equal to what's already set.
> */
> - if (kvm_vcpu_has_run(vcpu)) {
> + if (!kvm_can_set_cpuid_and_feature_msrs(vcpu)) {
> r = kvm_cpuid_check_equal(vcpu, e2, nent);
> if (r)
> goto err;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 02c450686b4a..f17324546900 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6031,11 +6031,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
> vcpu->arch.nested_mmu.cpu_role.ext.valid = 0;
> kvm_mmu_reset_context(vcpu);
>
> - /*
> - * Changing guest CPUID after KVM_RUN is forbidden, see the comment in
> - * kvm_arch_vcpu_ioctl().
> - */
> - KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm);
> + KVM_BUG_ON(!kvm_can_set_cpuid_and_feature_msrs(vcpu), vcpu->kvm);
> }
>
> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 487ad19a236e..ff20b4102173 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -853,7 +853,7 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>
> - if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
> + if (KVM_BUG_ON(!kvm_can_set_cpuid_and_feature_msrs(vcpu), vcpu->kvm))
> return;
>
> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ff8812f3a129..211d8c24a4b1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2314,13 +2314,14 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
> u64 val;
>
> /*
> - * Disallow writes to immutable feature MSRs after KVM_RUN. KVM does
> - * not support modifying the guest vCPU model on the fly, e.g. changing
> - * the nVMX capabilities while L2 is running is nonsensical. Allow
> - * writes of the same value, e.g. to allow userspace to blindly stuff
> - * all MSRs when emulating RESET.
> + * Reject writes to immutable feature MSRs if the vCPU model is frozen,
> + * as KVM doesn't support modifying the guest vCPU model on the fly,
> + * e.g. changing the VMX capabilities MSRs while L2 is active is
> + * nonsensical. Allow writes of the same value, e.g. so that userspace
> + * can blindly stuff all MSRs when emulating RESET.
> */
> - if (kvm_vcpu_has_run(vcpu) && kvm_is_immutable_feature_msr(index) &&
> + if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
> + kvm_is_immutable_feature_msr(index) &&
> (do_get_msr(vcpu, index, &val) || *data != val))
> return -EINVAL;
>
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index fdab0ad49098..9084e0dfa15c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -172,9 +172,9 @@ static inline void kvm_nested_vmexit_handle_ibrs(struct kvm_vcpu *vcpu)
> indirect_branch_prediction_barrier();
> }
>
> -static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
> +static inline bool kvm_can_set_cpuid_and_feature_msrs(struct kvm_vcpu *vcpu)
> {
> - return vcpu->arch.last_vmentry_cpu != -1;
> + return vcpu->arch.last_vmentry_cpu == -1 && !is_guest_mode(vcpu);
> }
To make this self-contained (e.g. for readers not coming from
kvm_set_cpuid()), should we add a comment here about is_guest_mode()
only possibly being true with last_vmentry_cpu == -1 if userspace does
the set CPUID, set nested state, set CPUID again dance?
>
> static inline void kvm_set_mp_state(struct kvm_vcpu *vcpu, int mp_state)
>
> base-commit: 9448598b22c50c8a5bb77a9103e2d49f134c9578
> --
> 2.52.0.351.gbe84eed79e-goog
>
Powered by blists - more mailing lists