lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250920065233.GA48402@k08j02272.eu95sqa>
Date: Sat, 20 Sep 2025 14:52:33 +0800
From: Hou Wenlong <houwenlong.hwl@...group.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Tom Lendacky <thomas.lendacky@....com>, kvm@...r.kernel.org,
	Lai Jiangshan <jiangshan.ljs@...group.com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: SVM: Use cached value as restore value of
 TSC_AUX for SEV-ES guest

On Fri, Sep 19, 2025 at 09:23:26AM -0700, Sean Christopherson wrote:
> On Fri, Sep 19, 2025, Hou Wenlong wrote:
> > On Thu, Sep 18, 2025 at 01:47:06PM -0500, Tom Lendacky wrote:
> > > On 9/17/25 22:38, Hou Wenlong wrote:
> > > > The commit 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support
> > > > for virtualized TSC_AUX") assumes that TSC_AUX is not changed by Linux
> > > > post-boot, so it always restores the initial host value on #VMEXIT.
> > > > However, this is not true in KVM, as it can be modified by user return
> > > > MSR support for normal guests. If an SEV-ES guest always restores the
> > > > initial host value on #VMEXIT, this may result in the cached value in
> > > > user return MSR being different from the hardware value if the previous
> > > > vCPU was a non-SEV-ES guest that had called kvm_set_user_return_msr().
> > > > Consequently, this may pose a problem when switching back to that vCPU,
> > > > as kvm_set_user_return_msr() would not update the hardware value because
> > > > the cached value matches the target value. Unlike the TDX case, the
> > > > SEV-ES guest has the ability to set the restore value in the host save
> > > > area, and the cached value in the user return MSR is always the current
> > > > hardware value. Therefore, the cached value could be used directly
> > > > without RDMSR in svm_prepare_switch_to_guest(), making this change
> > > > minimal.
> > > 
> > > I'm not sure I follow. If Linux never changes the value of TSC_AUX once it
> > > has set it, then how can it ever be different? Have you seen this issue?
> > > 
> > > Thanks,
> > > Tom
> > >
> > Hi, Tom.
> > 
> > IIUD, the normal guest still uses the user return MSR to load the guest
> > TSC_AUX value into the hardware when TSC_AUX virtualization is
> > supported.  However, the user return MSR only restores the host value
> > when returning to userspace, rather than when the vCPU is scheduled out.
> > This may lead to an issue during vCPU switching on a single pCPU, which
> > appears as follows:
> > 
> >        normal vCPU -> SEV-ES vCPU -> normal vCPU
> > 
> > When the normal vCPU switches to the SEV-ES vCPU, the hardware TSC_AUX
> > value remains as the guest value set in kvm_set_user_return_msr() by the
> > normal vCPU.  After the #VMEXIT from the SEV-ES vCPU, the hardware value
> > becomes the host value. However, the cached TSC_AUX value in the user
> > return MSR remains the guest value of previous normal vCPU. Therefore,
> > when switching back to that normal vCPU, kvm_set_user_return_msr() does
> > not perform a WRMSR to load the guest value into the hardware, because
> > the cached value matches the target value. As a result, during the
> > execution of the normal vCPU, the normal vCPU would get an incorrect
> > TSC_AUX value for RDTSCP/RDPID.
> > 
> > I didn't find the available description of TSC_AUX virtualization in
> > APM; all my analysis is based on the current KVM code.
> 
> I'm guessing TSC_AUX virtualization works like SEV-ES, where hardware context
> switches the MSR on VMRUN/#VMEXIT.
> 
> > Am I missing something?
> 
> Nope, I don't think so.  I also found the changelog a bit confusing though.  I
> would say omit the details about Linux not changing the value, and instead focus
> on the need to re-load the current hardware value.  That should be intuitive for
> all readers, and is correct regradless of what/whose value is currently in hardware.
> 
> I also think we should handle setting hostsa->tsc_aux in
> sev_es_prepare_switch_to_guest().  That obviously requires duplicating some of
> logic related to SEV-ES and TSC_AUX, but I think I prefer that to splitting the
> handling of the host save area.
> 
> How's this look? (compile tested only)
>

I'm fine with it. Thanks for your fixup; I think I need to improve my
changlog next time. :)

Thanks!

> --
> Subject: [PATCH] KVM: SVM: Re-load current, not host, TSC_AUX on #VMEXIT from
>  SEV-ES guest
> 
> Prior to running an SEV-ES guest, set TSC_AUX in the host save area to the
> current value in hardware, as tracked by the user return infrastructure,
> instead of always loading the host's desired value for the CPU.  If the
> pCPU is also running a non-SEV-ES vCPU, loading the hosts value on #VMEXIT
> could clobber the other vCPU's value, e.g. if the SEV-ES vCPU preempted
> the non-SEV-ES vCPU.
> 
> Note, unlike TDX, which blindly _zeroes_ TSC_AUX on exit, SEV-ES CPUs
> can load an arbitrary value.  Stuff the current value in the host save
> area instead of refreshing the user return cache so that KVM doesn't need
> to track whether or not the vCPU actually enterred the guest and thus
> loaded TSC_AUX from the host save area.
> 
> Fixes: 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX")
> Cc: stable@...r.kernel.org
> Suggested-by: Lai Jiangshan <jiangshan.ljs@...group.com>
> Signed-off-by: Hou Wenlong <houwenlong.hwl@...group.com>
> [sean: handle the SEV-ES case in sev_es_prepare_switch_to_guest()]
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kvm/svm/sev.c | 14 +++++++++++++-
>  arch/x86/kvm/svm/svm.c | 26 +++++++-------------------
>  arch/x86/kvm/svm/svm.h |  4 +++-
>  3 files changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index cce48fff2e6c..95767b9d0d55 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4664,7 +4664,9 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
> +void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm,
> +				    struct sev_es_save_area *hostsa,
> +				    int tsc_aux_uret_slot)
>  {
>  	struct kvm *kvm = svm->vcpu.kvm;
>  
> @@ -4712,6 +4714,16 @@ void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_are
>  		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
>  		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
>  	}
> +
> +	/*
> +	 * TSC_AUX is always virtualized for SEV-ES guests when the feature is
> +	 * available, i.e. TSC_AUX is loaded on #VMEXIT from the host save area.
> +	 * Set the save area to the current hardware value, i.e. the current
> +	 * user return value, so that the correct value is restored on #VMEXIT.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_V_TSC_AUX) &&
> +	    !WARN_ON_ONCE(tsc_aux_uret_slot < 0))
> +		hostsa->tsc_aux = kvm_get_user_return_msr(tsc_aux_uret_slot);
>  }
>  
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 67f4eed01526..662cf680faf7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -577,18 +577,6 @@ static int svm_enable_virtualization_cpu(void)
>  
>  	amd_pmu_enable_virt();
>  
> -	/*
> -	 * If TSC_AUX virtualization is supported, TSC_AUX becomes a swap type
> -	 * "B" field (see sev_es_prepare_switch_to_guest()) for SEV-ES guests.
> -	 * Since Linux does not change the value of TSC_AUX once set, prime the
> -	 * TSC_AUX field now to avoid a RDMSR on every vCPU run.
> -	 */
> -	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
> -		u32 __maybe_unused msr_hi;
> -
> -		rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
> -	}
> -
>  	return 0;
>  }
>  
> @@ -1400,16 +1388,17 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>  	 */
>  	vmsave(sd->save_area_pa);
>  	if (sev_es_guest(vcpu->kvm))
> -		sev_es_prepare_switch_to_guest(svm, sev_es_host_save_area(sd));
> +		sev_es_prepare_switch_to_guest(svm, sev_es_host_save_area(sd),
> +					       tsc_aux_uret_slot);
>  
>  	if (tsc_scaling)
>  		__svm_write_tsc_multiplier(vcpu->arch.tsc_scaling_ratio);
>  
>  	/*
> -	 * TSC_AUX is always virtualized for SEV-ES guests when the feature is
> -	 * available. The user return MSR support is not required in this case
> -	 * because TSC_AUX is restored on #VMEXIT from the host save area
> -	 * (which has been initialized in svm_enable_virtualization_cpu()).
> +	 * TSC_AUX is always virtualized (context switched by hardware) for
> +	 * SEV-ES guests when the feature is available.  For non-SEV-ES guests,
> +	 * context switch TSC_AUX via the user_return MSR infrastructure (not
> +	 * all CPUs support TSC_AUX virtualization).
>  	 */
>  	if (likely(tsc_aux_uret_slot >= 0) &&
>  	    (!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
> @@ -3004,8 +2993,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		 * TSC_AUX is always virtualized for SEV-ES guests when the
>  		 * feature is available. The user return MSR support is not
>  		 * required in this case because TSC_AUX is restored on #VMEXIT
> -		 * from the host save area (which has been initialized in
> -		 * svm_enable_virtualization_cpu()).
> +		 * from the host save area.
>  		 */
>  		if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm))
>  			break;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5d39c0b17988..4fda677e8ab3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -831,7 +831,9 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
>  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
>  void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> -void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
> +void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm,
> +				    struct sev_es_save_area *hostsa,
> +				    int tsc_aux_uret_slot);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
>  
>  #ifdef CONFIG_KVM_AMD_SEV
> 
> base-commit: 60e396349c19320485a249005256d1fafee60290
> --

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ