lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aN__iPxo5P1bFCNk@google.com>
Date: Fri, 3 Oct 2025 09:53:28 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Xiaoyao Li <xiaoyao.li@...el.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
 skips WRMSR

On Fri, Oct 03, 2025, Yan Zhao wrote:
> Sorry for the slow response due to the PRC holiday.
> 
> On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > setter.
> > > > > 
> > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > hardware.
> > > > Nit:
> > > > 
> > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> > 
> > No?  kvm_user_return_msr_update_cache() is passed the value that's currently
> > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> > 
> > Ah, I suspect you're calling out that the cache can be stale.  Maybe this?
> Right. But not just that the cache can be stale. My previous reply was quite
> misleading.
> 
> As with below tables, where
> CURR: msrs->values[slot].curr.
> REAL: value that's currently loaded in hardware
> 
> For TDs,
>                             CURR          REAL
>    -----------------------------------------------------------------------
> 1. enable virtualization    host value    host value
> 
> 2. TDH.VP.ENTER             host value    guest value (updated by tdx module)
> 3. TDH.VP.ENTER return      host value    defval (updated by tdx module)
> 4. tdx_vcpu_put             defval        defval
> 5. exit to user mode        host value    host value
> 
> 
> For normal VMs,
>                             CURR                 REAL
>    -----------------------------------------------------------------------
> 1. enable virtualization    host value           host value
> 2. before vcpu_run          shadow guest value   shadow guest value
> 3. after vcpu_run           shadow guest value   shadow guest value
> 4. exit to user mode        host value           host value
> 
> 
> Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> that's currently loaded in hardware. 

That isn't actually true, see the bottom.

> For TDs, msrs->values[slot].curr does not contain the value that's currently
> loaded in hardware in stages 2-3.
> 
> >   While the function does indeed update a cache, the nomenclature becomes
> >   slightly misleading when adding a getter[1], as the current value isn't
> >   _just_ the cached value, it's also the value that's currently loaded in
> >   hardware (ignoring that the cache holds stale data until the vCPU is put,
> So, "stale data" is not accurate.
> It just can't hold the current hardware loaded value when guest is running in
> TD.

Eh, that's still "stale data" as far as KVM is concerned.  Though I'm splitting
hairs, I totally agree that as written the changelog is misleading.

> it's also the value that's currently loaded in hardware.

I just need to append "when KVM is actively running" (or probably something more
verbose).

> >   i.e. until KVM prepares to switch back to the host).
> > 
> > Actually, that's a bug waiting to happen when the getter comes along.  Rather
> > than document the potential pitfall, what about adding a prep patch to mimize
> > the window?  Then _this_ patch shouldn't need the caveat about the cache being
> > stale.
> With below patch,
> 
> For TDs,
>                             CURR          REAL
>    -----------------------------------------------------------------------
> 1. enable virtualization    host value    host value
> 2. before TDH.VP.ENTER      defval        host value or defval
> 3. TDH.VP.ENTER             defval        guest value (updated by tdx module)
> 4. TDH.VP.ENTER return      defval        defval (updated by tdx module)
> 5. exit to user mode        host value    host value
> 
> msrs->values[slot].curr is still not the current value loaded in hardware.

Right, this where it becomes stale from my perspective.

> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index ff41d3d00380..326fa81cb35f 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> >                 vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> >  
> >         vt->guest_state_loaded = true;
> > +
> > +       /*
> > +        * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> Hmm, my previous mail didn't mention that besides saving guest value + clobber
> hardware value before exit to VMM, the TDX module also loads saved guest value
> to hardware on TDH.VP.ENTER.

That's not actually unique to TDX.  EFER is setup as a user return MSR, but is
context switched on VM-Enter/VM-Exit except when running on ancient hardware
without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
need to atomically switch to avoid toggling EFER.NX while in the host).

I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
the guest.  But because EFER is atomically loaded on VM-Exit in those cases, the
curr value can't be stale while KVM is running.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ