[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO7w+GwftVK5yLfy@yzhao56-desk.sh.intel.com>
Date: Wed, 15 Oct 2025 08:55:20 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Xiaoyao Li <xiaoyao.li@...el.com>, "Rick
Edgecombe" <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that
skips WRMSR
On Tue, Sep 30, 2025 at 09:34:20AM -0700, Sean Christopherson wrote:
> On Tue, Sep 30, 2025, Sean Christopherson wrote:
> > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > setter.
> > > > >
> > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > hardware.
> > > > Nit:
> > > >
> > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> >
> > No? kvm_user_return_msr_update_cache() is passed the value that's currently
> > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> >
> > Ah, I suspect you're calling out that the cache can be stale. Maybe this?
> >
> > While the function does indeed update a cache, the nomenclature becomes
> > slightly misleading when adding a getter[1], as the current value isn't
> > _just_ the cached value, it's also the value that's currently loaded in
> > hardware (ignoring that the cache holds stale data until the vCPU is put,
> > i.e. until KVM prepares to switch back to the host).
> >
> > Actually, that's a bug waiting to happen when the getter comes along. Rather
> > than document the potential pitfall, what about adding a prep patch to mimize
> > the window? Then _this_ patch shouldn't need the caveat about the cache being
> > stale.
>
> Ha! It's technically a bug fix. Because a forced shutdown will invoke
> kvm_shutdown() without waiting for tasks to exit, and so the on_each_cpu() calls
> to kvm_disable_virtualization_cpu() can call kvm_on_user_return() and thus
> consume a stale values->curr.
Looks consuming stale values->curr could also happen for normal VMs.
vmx_prepare_switch_to_guest
|->kvm_set_user_return_msr //for all slots that load_into_hardware is true
|->1) wrmsrq_safe(kvm_uret_msrs_list[slot], value);
| 2) __kvm_set_user_return_msr(slot, value);
|->msrs->values[slot].curr = value;
| kvm_user_return_register_notifier
As vmx_prepare_switch_to_guest() invokes kvm_set_user_return_msr() with local
irq enabled, there's a window where kvm_shutdown() may call
kvm_disable_virtualization_cpu() between steps 1) and 2). During this window,
the hardware contains the shadow guest value while values[slot].curr still holds
the host value.
In this scenario, if msrs->registered is true at step 1) (due to updating of a
previous slot), kvm_disable_virtualization_cpu() could call kvm_on_user_return()
and find "values->host == values->curr", which would leave the hardware value
set to the shadow guest value instead of restoring the host value.
Do you think it's a bug?
And do we need to fix it by disabling irq in kvm_set_user_return_msr() ? e.g.,
int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
{
struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
+ unsigned long flags;
int err;
value = (value & mask) | (msrs->values[slot].host & ~mask);
if (value == msrs->values[slot].curr)
return 0;
+
+ local_irq_save(flags);
err = wrmsrq_safe(kvm_uret_msrs_list[slot], value);
- if (err)
+ if (err) {
+ local_irq_restore(flags);
return 1;
+ }
__kvm_set_user_return_msr(slot, value);
+ local_irq_restore(flags);
return 0;
}
Powered by blists - more mailing lists