linux-kernel - Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that skips WRMSR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aO7w+GwftVK5yLfy@yzhao56-desk.sh.intel.com>
Date: Wed, 15 Oct 2025 08:55:20 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Xiaoyao Li <xiaoyao.li@...el.com>, "Rick
 Edgecombe" <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
 skips WRMSR

On Tue, Sep 30, 2025 at 09:34:20AM -0700, Sean Christopherson wrote:
> On Tue, Sep 30, 2025, Sean Christopherson wrote:
> > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > setter.
> > > > > 
> > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > hardware.
> > > > Nit:
> > > > 
> > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> > 
> > No?  kvm_user_return_msr_update_cache() is passed the value that's currently
> > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> > 
> > Ah, I suspect you're calling out that the cache can be stale.  Maybe this?
> > 
> >   While the function does indeed update a cache, the nomenclature becomes
> >   slightly misleading when adding a getter[1], as the current value isn't
> >   _just_ the cached value, it's also the value that's currently loaded in
> >   hardware (ignoring that the cache holds stale data until the vCPU is put,
> >   i.e. until KVM prepares to switch back to the host).
> > 
> > Actually, that's a bug waiting to happen when the getter comes along.  Rather
> > than document the potential pitfall, what about adding a prep patch to mimize
> > the window?  Then _this_ patch shouldn't need the caveat about the cache being
> > stale.
> 
> Ha!  It's technically a bug fix.  Because a forced shutdown will invoke
> kvm_shutdown() without waiting for tasks to exit, and so the on_each_cpu() calls
> to kvm_disable_virtualization_cpu() can call kvm_on_user_return() and thus
> consume a stale values->curr.
Looks consuming stale values->curr could also happen for normal VMs.

vmx_prepare_switch_to_guest
  |->kvm_set_user_return_msr //for all slots that load_into_hardware is true
       |->1) wrmsrq_safe(kvm_uret_msrs_list[slot], value);
       |  2) __kvm_set_user_return_msr(slot, value);
               |->msrs->values[slot].curr = value;
               |  kvm_user_return_register_notifier

As vmx_prepare_switch_to_guest() invokes kvm_set_user_return_msr() with local
irq enabled, there's a window where kvm_shutdown() may call
kvm_disable_virtualization_cpu() between steps 1) and 2). During this window,
the hardware contains the shadow guest value while values[slot].curr still holds
the host value.

In this scenario, if msrs->registered is true at step 1) (due to updating of a
previous slot), kvm_disable_virtualization_cpu() could call kvm_on_user_return()
and find "values->host == values->curr", which would leave the hardware value
set to the shadow guest value instead of restoring the host value.

Do you think it's a bug?
And do we need to fix it by disabling irq in kvm_set_user_return_msr() ? e.g.,

 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 {
        struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
+       unsigned long flags;
        int err;

        value = (value & mask) | (msrs->values[slot].host & ~mask);
        if (value == msrs->values[slot].curr)
                return 0;
+
+       local_irq_save(flags);
        err = wrmsrq_safe(kvm_uret_msrs_list[slot], value);
-       if (err)
+       if (err) {
+               local_irq_restore(flags);
                return 1;
+       }

        __kvm_set_user_return_msr(slot, value);
+       local_irq_restore(flags);
        return 0;
 }