lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aO0C+nOC6f+9L0Ir@yzhao56-desk.sh.intel.com>
Date: Mon, 13 Oct 2025 21:47:38 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Xiaoyao Li <xiaoyao.li@...el.com>, "Rick
 Edgecombe" <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
 skips WRMSR

On Fri, Oct 03, 2025 at 09:53:28AM -0700, Sean Christopherson wrote:
> On Fri, Oct 03, 2025, Yan Zhao wrote:
> > Sorry for the slow response due to the PRC holiday.
> > 
> > On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > > setter.
> > > > > > 
> > > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > > hardware.
> > > > > Nit:
> > > > > 
> > > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> > > 
> > > No?  kvm_user_return_msr_update_cache() is passed the value that's currently
> > > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> > > 
> > > Ah, I suspect you're calling out that the cache can be stale.  Maybe this?
> > Right. But not just that the cache can be stale. My previous reply was quite
> > misleading.
> > 
> > As with below tables, where
> > CURR: msrs->values[slot].curr.
> > REAL: value that's currently loaded in hardware
> > 
> > For TDs,
> >                             CURR          REAL
> >    -----------------------------------------------------------------------
> > 1. enable virtualization    host value    host value
> > 
> > 2. TDH.VP.ENTER             host value    guest value (updated by tdx module)
> > 3. TDH.VP.ENTER return      host value    defval (updated by tdx module)
> > 4. tdx_vcpu_put             defval        defval
> > 5. exit to user mode        host value    host value
> > 
> > 
> > For normal VMs,
> >                             CURR                 REAL
> >    -----------------------------------------------------------------------
> > 1. enable virtualization    host value           host value
> > 2. before vcpu_run          shadow guest value   shadow guest value
> > 3. after vcpu_run           shadow guest value   shadow guest value
> > 4. exit to user mode        host value           host value
> > 
> > 
> > Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> > that's currently loaded in hardware. 
> 
> That isn't actually true, see the bottom.
> 
> > For TDs, msrs->values[slot].curr does not contain the value that's currently
> > loaded in hardware in stages 2-3.
> > 
> > >   While the function does indeed update a cache, the nomenclature becomes
> > >   slightly misleading when adding a getter[1], as the current value isn't
> > >   _just_ the cached value, it's also the value that's currently loaded in
> > >   hardware (ignoring that the cache holds stale data until the vCPU is put,
> > So, "stale data" is not accurate.
> > It just can't hold the current hardware loaded value when guest is running in
> > TD.
> 
> Eh, that's still "stale data" as far as KVM is concerned.  Though I'm splitting
> hairs, I totally agree that as written the changelog is misleading.
> 
> > it's also the value that's currently loaded in hardware.
> 
> I just need to append "when KVM is actively running" (or probably something more
> verbose).
> 
> > >   i.e. until KVM prepares to switch back to the host).
> > > 
> > > Actually, that's a bug waiting to happen when the getter comes along.  Rather
> > > than document the potential pitfall, what about adding a prep patch to mimize
> > > the window?  Then _this_ patch shouldn't need the caveat about the cache being
> > > stale.
> > With below patch,
> > 
> > For TDs,
> >                             CURR          REAL
> >    -----------------------------------------------------------------------
> > 1. enable virtualization    host value    host value
> > 2. before TDH.VP.ENTER      defval        host value or defval
> > 3. TDH.VP.ENTER             defval        guest value (updated by tdx module)
> > 4. TDH.VP.ENTER return      defval        defval (updated by tdx module)
> > 5. exit to user mode        host value    host value
> > 
> > msrs->values[slot].curr is still not the current value loaded in hardware.
> 
> Right, this where it becomes stale from my perspective.
Ok, then though after your fix, msrs->values[slot].curr is still not matching
hardware value in stage 2, I think it's harmless. 

> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index ff41d3d00380..326fa81cb35f 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > >                 vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> > >  
> > >         vt->guest_state_loaded = true;
> > > +
> > > +       /*
> > > +        * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> > Hmm, my previous mail didn't mention that besides saving guest value + clobber
> > hardware value before exit to VMM, the TDX module also loads saved guest value
> > to hardware on TDH.VP.ENTER.
> 
> That's not actually unique to TDX.  EFER is setup as a user return MSR, but is
> context switched on VM-Enter/VM-Exit except when running on ancient hardware
> without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
> need to atomically switch to avoid toggling EFER.NX while in the host).
>
> I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
> the guest.  But because EFER is atomically loaded on VM-Exit in those cases, the
> curr value can't be stale while KVM is running.
Oh, yes.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ