linux-kernel - Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that skips WRMSR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aN/VaVklfXO5rId4@yzhao56-desk.sh.intel.com>
Date: Fri, 3 Oct 2025 21:53:45 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Xiaoyao Li <xiaoyao.li@...el.com>, "Rick
 Edgecombe" <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
 skips WRMSR

Sorry for the slow response due to the PRC holiday.

On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> On Tue, Sep 30, 2025, Yan Zhao wrote:
> > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > double-underscores version is doing a subset of the work of the "full"
> > > > setter.
> > > > 
> > > > While the function does indeed update a cache, the nomenclature becomes
> > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > hardware.
> > > Nit:
> > > 
> > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> 
> No?  kvm_user_return_msr_update_cache() is passed the value that's currently
> loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> 
> Ah, I suspect you're calling out that the cache can be stale.  Maybe this?
Right. But not just that the cache can be stale. My previous reply was quite
misleading.

As with below tables, where
CURR: msrs->values[slot].curr.
REAL: value that's currently loaded in hardware

For TDs,
                            CURR          REAL
   -----------------------------------------------------------------------
1. enable virtualization    host value    host value

2. TDH.VP.ENTER             host value    guest value (updated by tdx module)
3. TDH.VP.ENTER return      host value    defval (updated by tdx module)
4. tdx_vcpu_put             defval        defval
5. exit to user mode        host value    host value


For normal VMs,
                            CURR                 REAL
   -----------------------------------------------------------------------
1. enable virtualization    host value           host value

2. before vcpu_run          shadow guest value   shadow guest value
3. after vcpu_run           shadow guest value   shadow guest value
4. exit to user mode        host value           host value


Unlike normal VMs, where msrs->values[slot].curr always matches the the value
that's currently loaded in hardware. For TDs, msrs->values[slot].curr does not
contain the value that's currently loaded in hardware in stages 2-3.


>   While the function does indeed update a cache, the nomenclature becomes
>   slightly misleading when adding a getter[1], as the current value isn't
>   _just_ the cached value, it's also the value that's currently loaded in
>   hardware (ignoring that the cache holds stale data until the vCPU is put,
So, "stale data" is not accurate.
It just can't hold the current hardware loaded value when guest is running in
TD.

>   i.e. until KVM prepares to switch back to the host).
> 
> Actually, that's a bug waiting to happen when the getter comes along.  Rather
> than document the potential pitfall, what about adding a prep patch to mimize
> the window?  Then _this_ patch shouldn't need the caveat about the cache being
> stale.
With below patch,

For TDs,
                            CURR          REAL
   -----------------------------------------------------------------------
1. enable virtualization    host value    host value

2. before TDH.VP.ENTER      defval        host value or defval
3. TDH.VP.ENTER             defval        guest value (updated by tdx module)
4. TDH.VP.ENTER return      defval        defval (updated by tdx module)
5. exit to user mode        host value    host value

msrs->values[slot].curr is still not the current value loaded in hardware.

> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index ff41d3d00380..326fa81cb35f 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>                 vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
>  
>         vt->guest_state_loaded = true;
> +
> +       /*
> +        * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
Hmm, my previous mail didn't mention that besides saving guest value + clobber
hardware value before exit to VMM, the TDX module also loads saved guest value
to hardware on TDH.VP.ENTER.

> +        * VP.ENTER succeeds, i.e. on TD-Exit.  Mark those MSRs as needing an
> +        * update to synchronize the "current" value in KVM's cache with the
> +        * value in hardware (loaded by the TDX-Module).
> +        */
> +       to_tdx(vcpu)->need_user_return_msr_update = true;
>  }
>  
>  struct tdx_uret_msr {
> @@ -816,7 +824,6 @@ static void tdx_user_return_msr_update_cache(void)
>  static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_vt *vt = to_vt(vcpu);
> -       struct vcpu_tdx *tdx = to_tdx(vcpu);
>  
>         if (!vt->guest_state_loaded)
>                 return;
> @@ -824,11 +831,6 @@ static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
>         ++vcpu->stat.host_state_reload;
>         wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
>  
> -       if (tdx->guest_entered) {
> -               tdx_user_return_msr_update_cache();
> -               tdx->guest_entered = false;
> -       }
> -
>         vt->guest_state_loaded = false;
>  }
>  
> @@ -1067,7 +1069,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>                 update_debugctlmsr(vcpu->arch.host_debugctl);
>  
>         tdx_load_host_xsave_state(vcpu);
> -       tdx->guest_entered = true;
> +
> +       if (tdx->need_user_return_msr_update) {
> +               tdx_user_return_msr_update_cache();
> +               tdx->need_user_return_msr_update = false;
> +       }
>  
>         vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
>  
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index ca39a9391db1..fcac1627f71f 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -67,7 +67,7 @@ struct vcpu_tdx {
>         u64 vp_enter_ret;
>  
>         enum vcpu_tdx_state state;
> -       bool guest_entered;
> +       bool need_user_return_msr_update;
>  
>         u64 map_gpa_next;
>         u64 map_gpa_end;
>