lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <17b45add9debcc226f515e5d8bb31c508576fa1e.camel@redhat.com>
Date: Tue, 24 Jun 2025 15:59:49 -0400
From: mlevitsk@...hat.com
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
 <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, Adrian Hunter
	 <adrian.hunter@...el.com>
Subject: Re: [PATCH v6 8/8] KVM: VMX: Preserve host's
 DEBUGCTLMSR_FREEZE_IN_SMM while running the guest

On Tue, 2025-06-10 at 16:20 -0700, Sean Christopherson wrote:
> From: Maxim Levitsky <mlevitsk@...hat.com>
> 
> Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
> host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
> while running the guest.  When running with the "default treatment of SMIs"
> in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
> is visible to host (non-SMM) software, and instead transitions directly
> from VMX non-root to SMM.  And critically, DEBUGCTL isn't context switched
> by hardware on SMI or RSM, i.e. SMM will run with whatever value was
> resident in hardware at the time of the SMI.
> 
> Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
> events while the CPU is executing in SMM, which can pollute profiling and
> potentially leak information into the guest.
> 
> Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
> run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
> function call), by way of /sys/devices/cpu/freeze_on_smi.
> 
> Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
> preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
> i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
> at worst could lead to undesirable behavior in the future if AMD CPUs ever
> happened to pick up a collision with the bit.
> 
> Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
> owns and controls GUEST_IA32_DEBUGCTL.
> 
> WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
> lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
> 
> Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
> by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
> Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
> GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
> load_vmcs12_host_state().  For the VM-Fail case, even though KVM won't
> have actually entered the guest, vcpu_enter_guest() will have run with
> vmcs02 active and thus could result in vmcs01 being run with a stale value.
> 
> Cc: stable@...r.kernel.org
> Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> Co-developed-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  7 +++++++
>  arch/x86/kvm/vmx/main.c         |  2 ++
>  arch/x86/kvm/vmx/nested.c       |  3 +++
>  arch/x86/kvm/vmx/vmx.c          |  3 +++
>  arch/x86/kvm/vmx/vmx.h          | 15 ++++++++++++++-
>  arch/x86/kvm/x86.c              | 14 ++++++++++++--
>  6 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3d6325369a4b..e59527dd5a0b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1676,6 +1676,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
>  enum kvm_x86_run_flags {
>  	KVM_RUN_FORCE_IMMEDIATE_EXIT	= BIT(0),
>  	KVM_RUN_LOAD_GUEST_DR6		= BIT(1),
> +	KVM_RUN_LOAD_DEBUGCTL		= BIT(2),
>  };
>  
>  struct kvm_x86_ops {
> @@ -1706,6 +1707,12 @@ struct kvm_x86_ops {
>  	void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
>  	void (*vcpu_put)(struct kvm_vcpu *vcpu);
>  
> +	/*
> +	 * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
> +	 * match the host's value even while the guest is active.
> +	 */
> +	const u64 HOST_OWNED_DEBUGCTL;
> +
>  	void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
>  	int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
>  	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index c85cbce6d2f6..4a6d4460f947 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.vcpu_load = vt_op(vcpu_load),
>  	.vcpu_put = vt_op(vcpu_put),
>  
> +	.HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
> +
>  	.update_exception_bitmap = vt_op(update_exception_bitmap),
>  	.get_feature_msr = vmx_get_feature_msr,
>  	.get_msr = vt_op(get_msr),
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 9edce9f411a3..756c42e2d038 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4860,6 +4860,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
>  			WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
>  	}
>  
> +	/* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
> +	vmx_reload_guest_debugctl(vcpu);
> +
>  	/*
>  	 * Note that calling vmx_set_{efer,cr0,cr4} is important as they
>  	 * handle a variety of side effects to KVM's software model.
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 196f33d934d3..70a115d99530 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7371,6 +7371,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>  	if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
>  		set_debugreg(vcpu->arch.dr6, 6);
>  
> +	if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
> +		vmx_reload_guest_debugctl(vcpu);
> +
>  	/*
>  	 * Refresh vmcs.HOST_CR3 if necessary.  This must be done immediately
>  	 * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index c20a4185d10a..076af78af151 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
>  
>  static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
>  {
> +	WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
> +
> +	val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
>  	vmcs_write64(GUEST_IA32_DEBUGCTL, val);
>  }
>  
>  static inline u64 vmx_guest_debugctl_read(void)
>  {
> -	return vmcs_read64(GUEST_IA32_DEBUGCTL);
> +	return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
> +}
> +
> +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
> +{
> +	u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
> +
> +	if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
> +		return;
> +
> +	vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
>  }


Wouldn't it be better to use kvm_x86_ops.HOST_OWNED_DEBUGCTL here as well
to avoid logic duplication?

Besides this, everything else looks fine to me.


Best regards,
	Maxim Levitsky

>  
>  /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6742eb556d91..811f4db824ab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		dm_request_for_irq_injection(vcpu) &&
>  		kvm_cpu_accept_dm_intr(vcpu);
>  	fastpath_t exit_fastpath;
> -	u64 run_flags;
> +	u64 run_flags, debug_ctl;
>  
>  	bool req_immediate_exit = false;
>  
> @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		set_debugreg(0, 7);
>  	}
>  
> -	vcpu->arch.host_debugctl = get_debugctlmsr();
> +	/*
> +	 * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
> +	 * can be modified in IRQ context, e.g. via SMP function calls.  Inform
> +	 * vendor code if any host-owned bits were changed, e.g. so that the
> +	 * value loaded into hardware while running the guest can be updated.
> +	 */
> +	debug_ctl = get_debugctlmsr();
> +	if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
> +	    !vcpu->arch.guest_state_protected)
> +		run_flags |= KVM_RUN_LOAD_DEBUGCTL;
> +	vcpu->arch.host_debugctl = debug_ctl;
>  
>  	guest_timing_enter_irqoff();
>  


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ