[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250920065233.GA48402@k08j02272.eu95sqa>
Date: Sat, 20 Sep 2025 14:52:33 +0800
From: Hou Wenlong <houwenlong.hwl@...group.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Tom Lendacky <thomas.lendacky@....com>, kvm@...r.kernel.org,
Lai Jiangshan <jiangshan.ljs@...group.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: SVM: Use cached value as restore value of
TSC_AUX for SEV-ES guest
On Fri, Sep 19, 2025 at 09:23:26AM -0700, Sean Christopherson wrote:
> On Fri, Sep 19, 2025, Hou Wenlong wrote:
> > On Thu, Sep 18, 2025 at 01:47:06PM -0500, Tom Lendacky wrote:
> > > On 9/17/25 22:38, Hou Wenlong wrote:
> > > > The commit 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support
> > > > for virtualized TSC_AUX") assumes that TSC_AUX is not changed by Linux
> > > > post-boot, so it always restores the initial host value on #VMEXIT.
> > > > However, this is not true in KVM, as it can be modified by user return
> > > > MSR support for normal guests. If an SEV-ES guest always restores the
> > > > initial host value on #VMEXIT, this may result in the cached value in
> > > > user return MSR being different from the hardware value if the previous
> > > > vCPU was a non-SEV-ES guest that had called kvm_set_user_return_msr().
> > > > Consequently, this may pose a problem when switching back to that vCPU,
> > > > as kvm_set_user_return_msr() would not update the hardware value because
> > > > the cached value matches the target value. Unlike the TDX case, the
> > > > SEV-ES guest has the ability to set the restore value in the host save
> > > > area, and the cached value in the user return MSR is always the current
> > > > hardware value. Therefore, the cached value could be used directly
> > > > without RDMSR in svm_prepare_switch_to_guest(), making this change
> > > > minimal.
> > >
> > > I'm not sure I follow. If Linux never changes the value of TSC_AUX once it
> > > has set it, then how can it ever be different? Have you seen this issue?
> > >
> > > Thanks,
> > > Tom
> > >
> > Hi, Tom.
> >
> > IIUD, the normal guest still uses the user return MSR to load the guest
> > TSC_AUX value into the hardware when TSC_AUX virtualization is
> > supported. However, the user return MSR only restores the host value
> > when returning to userspace, rather than when the vCPU is scheduled out.
> > This may lead to an issue during vCPU switching on a single pCPU, which
> > appears as follows:
> >
> > normal vCPU -> SEV-ES vCPU -> normal vCPU
> >
> > When the normal vCPU switches to the SEV-ES vCPU, the hardware TSC_AUX
> > value remains as the guest value set in kvm_set_user_return_msr() by the
> > normal vCPU. After the #VMEXIT from the SEV-ES vCPU, the hardware value
> > becomes the host value. However, the cached TSC_AUX value in the user
> > return MSR remains the guest value of previous normal vCPU. Therefore,
> > when switching back to that normal vCPU, kvm_set_user_return_msr() does
> > not perform a WRMSR to load the guest value into the hardware, because
> > the cached value matches the target value. As a result, during the
> > execution of the normal vCPU, the normal vCPU would get an incorrect
> > TSC_AUX value for RDTSCP/RDPID.
> >
> > I didn't find the available description of TSC_AUX virtualization in
> > APM; all my analysis is based on the current KVM code.
>
> I'm guessing TSC_AUX virtualization works like SEV-ES, where hardware context
> switches the MSR on VMRUN/#VMEXIT.
>
> > Am I missing something?
>
> Nope, I don't think so. I also found the changelog a bit confusing though. I
> would say omit the details about Linux not changing the value, and instead focus
> on the need to re-load the current hardware value. That should be intuitive for
> all readers, and is correct regradless of what/whose value is currently in hardware.
>
> I also think we should handle setting hostsa->tsc_aux in
> sev_es_prepare_switch_to_guest(). That obviously requires duplicating some of
> logic related to SEV-ES and TSC_AUX, but I think I prefer that to splitting the
> handling of the host save area.
>
> How's this look? (compile tested only)
>
I'm fine with it. Thanks for your fixup; I think I need to improve my
changlog next time. :)
Thanks!
> --
> Subject: [PATCH] KVM: SVM: Re-load current, not host, TSC_AUX on #VMEXIT from
> SEV-ES guest
>
> Prior to running an SEV-ES guest, set TSC_AUX in the host save area to the
> current value in hardware, as tracked by the user return infrastructure,
> instead of always loading the host's desired value for the CPU. If the
> pCPU is also running a non-SEV-ES vCPU, loading the hosts value on #VMEXIT
> could clobber the other vCPU's value, e.g. if the SEV-ES vCPU preempted
> the non-SEV-ES vCPU.
>
> Note, unlike TDX, which blindly _zeroes_ TSC_AUX on exit, SEV-ES CPUs
> can load an arbitrary value. Stuff the current value in the host save
> area instead of refreshing the user return cache so that KVM doesn't need
> to track whether or not the vCPU actually enterred the guest and thus
> loaded TSC_AUX from the host save area.
>
> Fixes: 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX")
> Cc: stable@...r.kernel.org
> Suggested-by: Lai Jiangshan <jiangshan.ljs@...group.com>
> Signed-off-by: Hou Wenlong <houwenlong.hwl@...group.com>
> [sean: handle the SEV-ES case in sev_es_prepare_switch_to_guest()]
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/svm/sev.c | 14 +++++++++++++-
> arch/x86/kvm/svm/svm.c | 26 +++++++-------------------
> arch/x86/kvm/svm/svm.h | 4 +++-
> 3 files changed, 23 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index cce48fff2e6c..95767b9d0d55 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4664,7 +4664,9 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
> return 0;
> }
>
> -void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
> +void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm,
> + struct sev_es_save_area *hostsa,
> + int tsc_aux_uret_slot)
> {
> struct kvm *kvm = svm->vcpu.kvm;
>
> @@ -4712,6 +4714,16 @@ void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_are
> hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
> hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
> }
> +
> + /*
> + * TSC_AUX is always virtualized for SEV-ES guests when the feature is
> + * available, i.e. TSC_AUX is loaded on #VMEXIT from the host save area.
> + * Set the save area to the current hardware value, i.e. the current
> + * user return value, so that the correct value is restored on #VMEXIT.
> + */
> + if (cpu_feature_enabled(X86_FEATURE_V_TSC_AUX) &&
> + !WARN_ON_ONCE(tsc_aux_uret_slot < 0))
> + hostsa->tsc_aux = kvm_get_user_return_msr(tsc_aux_uret_slot);
> }
>
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 67f4eed01526..662cf680faf7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -577,18 +577,6 @@ static int svm_enable_virtualization_cpu(void)
>
> amd_pmu_enable_virt();
>
> - /*
> - * If TSC_AUX virtualization is supported, TSC_AUX becomes a swap type
> - * "B" field (see sev_es_prepare_switch_to_guest()) for SEV-ES guests.
> - * Since Linux does not change the value of TSC_AUX once set, prime the
> - * TSC_AUX field now to avoid a RDMSR on every vCPU run.
> - */
> - if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
> - u32 __maybe_unused msr_hi;
> -
> - rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
> - }
> -
> return 0;
> }
>
> @@ -1400,16 +1388,17 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> */
> vmsave(sd->save_area_pa);
> if (sev_es_guest(vcpu->kvm))
> - sev_es_prepare_switch_to_guest(svm, sev_es_host_save_area(sd));
> + sev_es_prepare_switch_to_guest(svm, sev_es_host_save_area(sd),
> + tsc_aux_uret_slot);
>
> if (tsc_scaling)
> __svm_write_tsc_multiplier(vcpu->arch.tsc_scaling_ratio);
>
> /*
> - * TSC_AUX is always virtualized for SEV-ES guests when the feature is
> - * available. The user return MSR support is not required in this case
> - * because TSC_AUX is restored on #VMEXIT from the host save area
> - * (which has been initialized in svm_enable_virtualization_cpu()).
> + * TSC_AUX is always virtualized (context switched by hardware) for
> + * SEV-ES guests when the feature is available. For non-SEV-ES guests,
> + * context switch TSC_AUX via the user_return MSR infrastructure (not
> + * all CPUs support TSC_AUX virtualization).
> */
> if (likely(tsc_aux_uret_slot >= 0) &&
> (!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
> @@ -3004,8 +2993,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> * TSC_AUX is always virtualized for SEV-ES guests when the
> * feature is available. The user return MSR support is not
> * required in this case because TSC_AUX is restored on #VMEXIT
> - * from the host save area (which has been initialized in
> - * svm_enable_virtualization_cpu()).
> + * from the host save area.
> */
> if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm))
> break;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5d39c0b17988..4fda677e8ab3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -831,7 +831,9 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
> int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
> void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> -void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
> +void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm,
> + struct sev_es_save_area *hostsa,
> + int tsc_aux_uret_slot);
> void sev_es_unmap_ghcb(struct vcpu_svm *svm);
>
> #ifdef CONFIG_KVM_AMD_SEV
>
> base-commit: 60e396349c19320485a249005256d1fafee60290
> --
Powered by blists - more mailing lists