[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aO0C+nOC6f+9L0Ir@yzhao56-desk.sh.intel.com>
Date: Mon, 13 Oct 2025 21:47:38 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Xiaoyao Li <xiaoyao.li@...el.com>, "Rick
Edgecombe" <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
skips WRMSR
On Fri, Oct 03, 2025 at 09:53:28AM -0700, Sean Christopherson wrote:
> On Fri, Oct 03, 2025, Yan Zhao wrote:
> > Sorry for the slow response due to the PRC holiday.
> >
> > On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > > setter.
> > > > > >
> > > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > > hardware.
> > > > > Nit:
> > > > >
> > > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> > >
> > > No? kvm_user_return_msr_update_cache() is passed the value that's currently
> > > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> > >
> > > Ah, I suspect you're calling out that the cache can be stale. Maybe this?
> > Right. But not just that the cache can be stale. My previous reply was quite
> > misleading.
> >
> > As with below tables, where
> > CURR: msrs->values[slot].curr.
> > REAL: value that's currently loaded in hardware
> >
> > For TDs,
> > CURR REAL
> > -----------------------------------------------------------------------
> > 1. enable virtualization host value host value
> >
> > 2. TDH.VP.ENTER host value guest value (updated by tdx module)
> > 3. TDH.VP.ENTER return host value defval (updated by tdx module)
> > 4. tdx_vcpu_put defval defval
> > 5. exit to user mode host value host value
> >
> >
> > For normal VMs,
> > CURR REAL
> > -----------------------------------------------------------------------
> > 1. enable virtualization host value host value
> > 2. before vcpu_run shadow guest value shadow guest value
> > 3. after vcpu_run shadow guest value shadow guest value
> > 4. exit to user mode host value host value
> >
> >
> > Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> > that's currently loaded in hardware.
>
> That isn't actually true, see the bottom.
>
> > For TDs, msrs->values[slot].curr does not contain the value that's currently
> > loaded in hardware in stages 2-3.
> >
> > > While the function does indeed update a cache, the nomenclature becomes
> > > slightly misleading when adding a getter[1], as the current value isn't
> > > _just_ the cached value, it's also the value that's currently loaded in
> > > hardware (ignoring that the cache holds stale data until the vCPU is put,
> > So, "stale data" is not accurate.
> > It just can't hold the current hardware loaded value when guest is running in
> > TD.
>
> Eh, that's still "stale data" as far as KVM is concerned. Though I'm splitting
> hairs, I totally agree that as written the changelog is misleading.
>
> > it's also the value that's currently loaded in hardware.
>
> I just need to append "when KVM is actively running" (or probably something more
> verbose).
>
> > > i.e. until KVM prepares to switch back to the host).
> > >
> > > Actually, that's a bug waiting to happen when the getter comes along. Rather
> > > than document the potential pitfall, what about adding a prep patch to mimize
> > > the window? Then _this_ patch shouldn't need the caveat about the cache being
> > > stale.
> > With below patch,
> >
> > For TDs,
> > CURR REAL
> > -----------------------------------------------------------------------
> > 1. enable virtualization host value host value
> > 2. before TDH.VP.ENTER defval host value or defval
> > 3. TDH.VP.ENTER defval guest value (updated by tdx module)
> > 4. TDH.VP.ENTER return defval defval (updated by tdx module)
> > 5. exit to user mode host value host value
> >
> > msrs->values[slot].curr is still not the current value loaded in hardware.
>
> Right, this where it becomes stale from my perspective.
Ok, then though after your fix, msrs->values[slot].curr is still not matching
hardware value in stage 2, I think it's harmless.
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index ff41d3d00380..326fa81cb35f 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > > vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> > >
> > > vt->guest_state_loaded = true;
> > > +
> > > + /*
> > > + * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> > Hmm, my previous mail didn't mention that besides saving guest value + clobber
> > hardware value before exit to VMM, the TDX module also loads saved guest value
> > to hardware on TDH.VP.ENTER.
>
> That's not actually unique to TDX. EFER is setup as a user return MSR, but is
> context switched on VM-Enter/VM-Exit except when running on ancient hardware
> without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
> need to atomically switch to avoid toggling EFER.NX while in the host).
>
> I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
> the guest. But because EFER is atomically loaded on VM-Exit in those cases, the
> curr value can't be stale while KVM is running.
Oh, yes.
Powered by blists - more mailing lists