[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aN__iPxo5P1bFCNk@google.com>
Date: Fri, 3 Oct 2025 09:53:28 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Xiaoyao Li <xiaoyao.li@...el.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
skips WRMSR
On Fri, Oct 03, 2025, Yan Zhao wrote:
> Sorry for the slow response due to the PRC holiday.
>
> On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > setter.
> > > > >
> > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > hardware.
> > > > Nit:
> > > >
> > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> >
> > No? kvm_user_return_msr_update_cache() is passed the value that's currently
> > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> >
> > Ah, I suspect you're calling out that the cache can be stale. Maybe this?
> Right. But not just that the cache can be stale. My previous reply was quite
> misleading.
>
> As with below tables, where
> CURR: msrs->values[slot].curr.
> REAL: value that's currently loaded in hardware
>
> For TDs,
> CURR REAL
> -----------------------------------------------------------------------
> 1. enable virtualization host value host value
>
> 2. TDH.VP.ENTER host value guest value (updated by tdx module)
> 3. TDH.VP.ENTER return host value defval (updated by tdx module)
> 4. tdx_vcpu_put defval defval
> 5. exit to user mode host value host value
>
>
> For normal VMs,
> CURR REAL
> -----------------------------------------------------------------------
> 1. enable virtualization host value host value
> 2. before vcpu_run shadow guest value shadow guest value
> 3. after vcpu_run shadow guest value shadow guest value
> 4. exit to user mode host value host value
>
>
> Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> that's currently loaded in hardware.
That isn't actually true, see the bottom.
> For TDs, msrs->values[slot].curr does not contain the value that's currently
> loaded in hardware in stages 2-3.
>
> > While the function does indeed update a cache, the nomenclature becomes
> > slightly misleading when adding a getter[1], as the current value isn't
> > _just_ the cached value, it's also the value that's currently loaded in
> > hardware (ignoring that the cache holds stale data until the vCPU is put,
> So, "stale data" is not accurate.
> It just can't hold the current hardware loaded value when guest is running in
> TD.
Eh, that's still "stale data" as far as KVM is concerned. Though I'm splitting
hairs, I totally agree that as written the changelog is misleading.
> it's also the value that's currently loaded in hardware.
I just need to append "when KVM is actively running" (or probably something more
verbose).
> > i.e. until KVM prepares to switch back to the host).
> >
> > Actually, that's a bug waiting to happen when the getter comes along. Rather
> > than document the potential pitfall, what about adding a prep patch to mimize
> > the window? Then _this_ patch shouldn't need the caveat about the cache being
> > stale.
> With below patch,
>
> For TDs,
> CURR REAL
> -----------------------------------------------------------------------
> 1. enable virtualization host value host value
> 2. before TDH.VP.ENTER defval host value or defval
> 3. TDH.VP.ENTER defval guest value (updated by tdx module)
> 4. TDH.VP.ENTER return defval defval (updated by tdx module)
> 5. exit to user mode host value host value
>
> msrs->values[slot].curr is still not the current value loaded in hardware.
Right, this where it becomes stale from my perspective.
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index ff41d3d00380..326fa81cb35f 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> >
> > vt->guest_state_loaded = true;
> > +
> > + /*
> > + * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> Hmm, my previous mail didn't mention that besides saving guest value + clobber
> hardware value before exit to VMM, the TDX module also loads saved guest value
> to hardware on TDH.VP.ENTER.
That's not actually unique to TDX. EFER is setup as a user return MSR, but is
context switched on VM-Enter/VM-Exit except when running on ancient hardware
without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
need to atomically switch to avoid toggling EFER.NX while in the host).
I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
the guest. But because EFER is atomically loaded on VM-Exit in those cases, the
curr value can't be stale while KVM is running.
Powered by blists - more mailing lists