lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aM2Dfu0n-JyYttaH@google.com>
Date: Fri, 19 Sep 2025 09:23:26 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Hou Wenlong <houwenlong.hwl@...group.com>
Cc: Tom Lendacky <thomas.lendacky@....com>, kvm@...r.kernel.org, 
	Lai Jiangshan <jiangshan.ljs@...group.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: SVM: Use cached value as restore value of
 TSC_AUX for SEV-ES guest

On Fri, Sep 19, 2025, Hou Wenlong wrote:
> On Thu, Sep 18, 2025 at 01:47:06PM -0500, Tom Lendacky wrote:
> > On 9/17/25 22:38, Hou Wenlong wrote:
> > > The commit 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support
> > > for virtualized TSC_AUX") assumes that TSC_AUX is not changed by Linux
> > > post-boot, so it always restores the initial host value on #VMEXIT.
> > > However, this is not true in KVM, as it can be modified by user return
> > > MSR support for normal guests. If an SEV-ES guest always restores the
> > > initial host value on #VMEXIT, this may result in the cached value in
> > > user return MSR being different from the hardware value if the previous
> > > vCPU was a non-SEV-ES guest that had called kvm_set_user_return_msr().
> > > Consequently, this may pose a problem when switching back to that vCPU,
> > > as kvm_set_user_return_msr() would not update the hardware value because
> > > the cached value matches the target value. Unlike the TDX case, the
> > > SEV-ES guest has the ability to set the restore value in the host save
> > > area, and the cached value in the user return MSR is always the current
> > > hardware value. Therefore, the cached value could be used directly
> > > without RDMSR in svm_prepare_switch_to_guest(), making this change
> > > minimal.
> > 
> > I'm not sure I follow. If Linux never changes the value of TSC_AUX once it
> > has set it, then how can it ever be different? Have you seen this issue?
> > 
> > Thanks,
> > Tom
> >
> Hi, Tom.
> 
> IIUD, the normal guest still uses the user return MSR to load the guest
> TSC_AUX value into the hardware when TSC_AUX virtualization is
> supported.  However, the user return MSR only restores the host value
> when returning to userspace, rather than when the vCPU is scheduled out.
> This may lead to an issue during vCPU switching on a single pCPU, which
> appears as follows:
> 
>        normal vCPU -> SEV-ES vCPU -> normal vCPU
> 
> When the normal vCPU switches to the SEV-ES vCPU, the hardware TSC_AUX
> value remains as the guest value set in kvm_set_user_return_msr() by the
> normal vCPU.  After the #VMEXIT from the SEV-ES vCPU, the hardware value
> becomes the host value. However, the cached TSC_AUX value in the user
> return MSR remains the guest value of previous normal vCPU. Therefore,
> when switching back to that normal vCPU, kvm_set_user_return_msr() does
> not perform a WRMSR to load the guest value into the hardware, because
> the cached value matches the target value. As a result, during the
> execution of the normal vCPU, the normal vCPU would get an incorrect
> TSC_AUX value for RDTSCP/RDPID.
> 
> I didn't find the available description of TSC_AUX virtualization in
> APM; all my analysis is based on the current KVM code.

I'm guessing TSC_AUX virtualization works like SEV-ES, where hardware context
switches the MSR on VMRUN/#VMEXIT.

> Am I missing something?

Nope, I don't think so.  I also found the changelog a bit confusing though.  I
would say omit the details about Linux not changing the value, and instead focus
on the need to re-load the current hardware value.  That should be intuitive for
all readers, and is correct regradless of what/whose value is currently in hardware.

I also think we should handle setting hostsa->tsc_aux in
sev_es_prepare_switch_to_guest().  That obviously requires duplicating some of
logic related to SEV-ES and TSC_AUX, but I think I prefer that to splitting the
handling of the host save area.

How's this look? (compile tested only)

--
Subject: [PATCH] KVM: SVM: Re-load current, not host, TSC_AUX on #VMEXIT from
 SEV-ES guest

Prior to running an SEV-ES guest, set TSC_AUX in the host save area to the
current value in hardware, as tracked by the user return infrastructure,
instead of always loading the host's desired value for the CPU.  If the
pCPU is also running a non-SEV-ES vCPU, loading the hosts value on #VMEXIT
could clobber the other vCPU's value, e.g. if the SEV-ES vCPU preempted
the non-SEV-ES vCPU.

Note, unlike TDX, which blindly _zeroes_ TSC_AUX on exit, SEV-ES CPUs
can load an arbitrary value.  Stuff the current value in the host save
area instead of refreshing the user return cache so that KVM doesn't need
to track whether or not the vCPU actually enterred the guest and thus
loaded TSC_AUX from the host save area.

Fixes: 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX")
Cc: stable@...r.kernel.org
Suggested-by: Lai Jiangshan <jiangshan.ljs@...group.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@...group.com>
[sean: handle the SEV-ES case in sev_es_prepare_switch_to_guest()]
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
 arch/x86/kvm/svm/sev.c | 14 +++++++++++++-
 arch/x86/kvm/svm/svm.c | 26 +++++++-------------------
 arch/x86/kvm/svm/svm.h |  4 +++-
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cce48fff2e6c..95767b9d0d55 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4664,7 +4664,9 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
+void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm,
+				    struct sev_es_save_area *hostsa,
+				    int tsc_aux_uret_slot)
 {
 	struct kvm *kvm = svm->vcpu.kvm;
 
@@ -4712,6 +4714,16 @@ void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_are
 		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
 		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
 	}
+
+	/*
+	 * TSC_AUX is always virtualized for SEV-ES guests when the feature is
+	 * available, i.e. TSC_AUX is loaded on #VMEXIT from the host save area.
+	 * Set the save area to the current hardware value, i.e. the current
+	 * user return value, so that the correct value is restored on #VMEXIT.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_V_TSC_AUX) &&
+	    !WARN_ON_ONCE(tsc_aux_uret_slot < 0))
+		hostsa->tsc_aux = kvm_get_user_return_msr(tsc_aux_uret_slot);
 }
 
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 67f4eed01526..662cf680faf7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -577,18 +577,6 @@ static int svm_enable_virtualization_cpu(void)
 
 	amd_pmu_enable_virt();
 
-	/*
-	 * If TSC_AUX virtualization is supported, TSC_AUX becomes a swap type
-	 * "B" field (see sev_es_prepare_switch_to_guest()) for SEV-ES guests.
-	 * Since Linux does not change the value of TSC_AUX once set, prime the
-	 * TSC_AUX field now to avoid a RDMSR on every vCPU run.
-	 */
-	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
-		u32 __maybe_unused msr_hi;
-
-		rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
-	}
-
 	return 0;
 }
 
@@ -1400,16 +1388,17 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	 */
 	vmsave(sd->save_area_pa);
 	if (sev_es_guest(vcpu->kvm))
-		sev_es_prepare_switch_to_guest(svm, sev_es_host_save_area(sd));
+		sev_es_prepare_switch_to_guest(svm, sev_es_host_save_area(sd),
+					       tsc_aux_uret_slot);
 
 	if (tsc_scaling)
 		__svm_write_tsc_multiplier(vcpu->arch.tsc_scaling_ratio);
 
 	/*
-	 * TSC_AUX is always virtualized for SEV-ES guests when the feature is
-	 * available. The user return MSR support is not required in this case
-	 * because TSC_AUX is restored on #VMEXIT from the host save area
-	 * (which has been initialized in svm_enable_virtualization_cpu()).
+	 * TSC_AUX is always virtualized (context switched by hardware) for
+	 * SEV-ES guests when the feature is available.  For non-SEV-ES guests,
+	 * context switch TSC_AUX via the user_return MSR infrastructure (not
+	 * all CPUs support TSC_AUX virtualization).
 	 */
 	if (likely(tsc_aux_uret_slot >= 0) &&
 	    (!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
@@ -3004,8 +2993,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 * TSC_AUX is always virtualized for SEV-ES guests when the
 		 * feature is available. The user return MSR support is not
 		 * required in this case because TSC_AUX is restored on #VMEXIT
-		 * from the host save area (which has been initialized in
-		 * svm_enable_virtualization_cpu()).
+		 * from the host save area.
 		 */
 		if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm))
 			break;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d39c0b17988..4fda677e8ab3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -831,7 +831,9 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
-void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
+void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm,
+				    struct sev_es_save_area *hostsa,
+				    int tsc_aux_uret_slot);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 
 #ifdef CONFIG_KVM_AMD_SEV

base-commit: 60e396349c19320485a249005256d1fafee60290
--

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ