linux-kernel - Re: [PATCH] KVM: x86: Disallow setting CPUID and/or feature MSRs if L2 is active

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <shaevlgw5h7i3oxtoj6yqun3mklwdi6nng3noypxeqevnuqlcu@urfhn55x7owk>
Date: Fri, 2 Jan 2026 18:00:32 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Kevin Cheng <chengkev@...gle.com>
Subject: Re: [PATCH] KVM: x86: Disallow setting CPUID and/or feature MSRs if
 L2 is active

On Tue, Dec 30, 2025 at 12:56:41PM -0800, Sean Christopherson wrote:
> Extend KVM's restriction on CPUID and feature MSR changes to disallow
> updates while L2 is active in addition to rejecting updates after the vCPU
> has run at least once.  Like post-run vCPU model updates, attempting to
> react to model changes while L2 is active is practically infeasible, e.g.
> KVM would need to do _something_ in response to impossible situations where
> userspace has a removed a feature that was consumed as parted of nested
> VM-Enter.
> 
> In practice, disallowing vCPU model changes while L2 is active is largely
> uninteresting, as the only way for L2 to be active without the vCPU having
> run at least once is if userspace stuffed state via KVM_SET_NESTED_STATE.
> And because KVM_SET_NESTED_STATE can't put the vCPU into L2 without
> userspace first defining the vCPU model, e.g. to enable SVM/VMX, modifying
> the vCPU model while L2 is active would require deliberately setting the
> vCPU model, then loading nested state, and then changing the model.  I.e.
> no sane VMM should run afoul of the new restriction, and any VMM that does
> encounter problems has likely been running a broken setup for a long time.
> 
> Cc: Yosry Ahmed <yosry.ahmed@...ux.dev>
> Cc: Kevin Cheng <chengkev@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>

I can't speak to the policy change, but the code looks correct (with a
minor nit below):

Reviewed-by: Yosry Ahmed <yosry.ahmed@...ux.dev>

> ---
>  arch/x86/kvm/cpuid.c   | 19 +++++++++++--------
>  arch/x86/kvm/mmu/mmu.c |  6 +-----
>  arch/x86/kvm/pmu.c     |  2 +-
>  arch/x86/kvm/x86.c     | 13 +++++++------
>  arch/x86/kvm/x86.h     |  4 ++--
>  5 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 88a5426674a1..f37331ad3ad8 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -534,17 +534,20 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
>  
>  	/*
> -	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> -	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> -	 * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
> -	 * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
> -	 * the core vCPU model on the fly. It would've been better to forbid any
> -	 * KVM_SET_CPUID{,2} calls after KVM_RUN altogether but unfortunately
> -	 * some VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
> +	 * KVM does not correctly handle changing guest CPUID after KVM_RUN or
> +	 * while L2 is active, as MAXPHYADDR, GBPAGES support, AMD reserved bit
> +	 * behavior, etc. aren't tracked in kvm_mmu_page_role, and L2 state
> +	 * can't be adjusted (without breaking L2 in some way).  As a result,
> +	 * KVM may reuse SPs/SPTEs and/or run L2 with bad/misconfigured state.
> +	 *
> +	 * In practice, no sane VMM mucks with the core vCPU model on the fly.
> +	 * It would've been better to forbid any KVM_SET_CPUID{,2} calls after
> +	 * KVM_RUN or KVM_SET_NESTED_STATE altogether, but unfortunately some
> +	 * VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
>  	 * KVM_SET_CPUID{,2} again. To support this legacy behavior, check
>  	 * whether the supplied CPUID data is equal to what's already set.
>  	 */
> -	if (kvm_vcpu_has_run(vcpu)) {
> +	if (!kvm_can_set_cpuid_and_feature_msrs(vcpu)) {
>  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
>  		if (r)
>  			goto err;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 02c450686b4a..f17324546900 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6031,11 +6031,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.nested_mmu.cpu_role.ext.valid = 0;
>  	kvm_mmu_reset_context(vcpu);
>  
> -	/*
> -	 * Changing guest CPUID after KVM_RUN is forbidden, see the comment in
> -	 * kvm_arch_vcpu_ioctl().
> -	 */
> -	KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm);
> +	KVM_BUG_ON(!kvm_can_set_cpuid_and_feature_msrs(vcpu), vcpu->kvm);
>  }
>  
>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 487ad19a236e..ff20b4102173 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -853,7 +853,7 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>  
> -	if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
> +	if (KVM_BUG_ON(!kvm_can_set_cpuid_and_feature_msrs(vcpu), vcpu->kvm))
>  		return;
>  
>  	/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ff8812f3a129..211d8c24a4b1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2314,13 +2314,14 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
>  	u64 val;
>  
>  	/*
> -	 * Disallow writes to immutable feature MSRs after KVM_RUN.  KVM does
> -	 * not support modifying the guest vCPU model on the fly, e.g. changing
> -	 * the nVMX capabilities while L2 is running is nonsensical.  Allow
> -	 * writes of the same value, e.g. to allow userspace to blindly stuff
> -	 * all MSRs when emulating RESET.
> +	 * Reject writes to immutable feature MSRs if the vCPU model is frozen,
> +	 * as KVM doesn't support modifying the guest vCPU model on the fly,
> +	 * e.g. changing the VMX capabilities MSRs while L2 is active is
> +	 * nonsensical.  Allow writes of the same value, e.g. so that userspace
> +	 * can blindly stuff all MSRs when emulating RESET.
>  	 */
> -	if (kvm_vcpu_has_run(vcpu) && kvm_is_immutable_feature_msr(index) &&
> +	if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
> +	    kvm_is_immutable_feature_msr(index) &&
>  	    (do_get_msr(vcpu, index, &val) || *data != val))
>  		return -EINVAL;
>  
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index fdab0ad49098..9084e0dfa15c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -172,9 +172,9 @@ static inline void kvm_nested_vmexit_handle_ibrs(struct kvm_vcpu *vcpu)
>  		indirect_branch_prediction_barrier();
>  }
>  
> -static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
> +static inline bool kvm_can_set_cpuid_and_feature_msrs(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.last_vmentry_cpu != -1;
> +	return vcpu->arch.last_vmentry_cpu == -1 && !is_guest_mode(vcpu);
>  }

To make this self-contained (e.g. for readers not coming from
kvm_set_cpuid()), should we add a comment here about is_guest_mode()
only possibly being true with last_vmentry_cpu == -1 if userspace does
the set CPUID, set nested state, set CPUID again dance?

>  
>  static inline void kvm_set_mp_state(struct kvm_vcpu *vcpu, int mp_state)
> 
> base-commit: 9448598b22c50c8a5bb77a9103e2d49f134c9578
> -- 
> 2.52.0.351.gbe84eed79e-goog
>