[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd3fb8dd79d7766f383748ec472de3943021eb39.camel@infradead.org>
Date: Wed, 05 Feb 2025 15:26:54 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Paul Durrant <paul@....org>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
syzbot+cdeaeec70992eca2d920@...kaller.appspotmail.com, Joao Martins
<joao.m.martins@...cle.com>
Subject: Re: [PATCH 1/5] KVM: x86/xen: Restrict hypercall MSR to unofficial
synthetic range
On Wed, 2025-02-05 at 07:06 -0800, Sean Christopherson wrote:
> On Wed, Feb 05, 2025, David Woodhouse wrote:
> > On Fri, 2025-01-31 at 17:13 -0800, Sean Christopherson wrote:
> > > --- a/arch/x86/kvm/xen.c
> > > +++ b/arch/x86/kvm/xen.c
> > > @@ -1324,6 +1324,14 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
> > > xhc->blob_size_32 || xhc->blob_size_64))
> > > return -EINVAL;
> > >
> > > + /*
> > > + * Restrict the MSR to the range that is unofficially reserved for
> > > + * synthetic, virtualization-defined MSRs, e.g. to prevent confusing
> > > + * KVM by colliding with a real MSR that requires special handling.
> > > + */
> > > + if (xhc->msr && (xhc->msr < 0x40000000 || xhc->msr > 0x4fffffff))
> > > + return -EINVAL;
> > > +
> > > mutex_lock(&kvm->arch.xen.xen_lock);
> > >
> > > if (xhc->msr && !kvm->arch.xen_hvm_config.msr)
> >
> > I'd prefer to see #defines for those magic values.
>
> Can do. Hmm, and since this would be visible to userspace, arguably the #defines
> should go in arch/x86/include/uapi/asm/kvm.h
Thanks.
> > Especially as there is a corresponding requirement that they never be set
> > from host context (which is where the potential locking issues come in).
> > Which train of thought leads me to ponder this as an alternative (or
> > additional) solution:
> >
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3733,7 +3733,13 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > u32 msr = msr_info->index;
> > u64 data = msr_info->data;
> >
> > - if (msr && msr == vcpu->kvm->arch.xen_hvm_config.msr)
> > + /*
> > + * Do not allow host-initiated writes to trigger the Xen hypercall
> > + * page setup; it could incur locking paths which are not expected
> > + * if userspace sets the MSR in an unusual location.
>
> That's just as likely to break userspace. Doing a save/restore on the MSR doesn't
> make a whole lot of sense since it's effectively a "command" MSR, but IMO it's not
> any less likely than userspace putting the MSR index outside of the synthetic range.
Save/restore on the MSR makes no sense. It's a write-only MSR; writing
to it has no effect *other* than populating the target page. In KVM we
don't implement reading from it at all; I don't think Xen does either?
And even if it was readable and would rather pointlessly return the
last value written to it, save/restore arguably shouldn't actually
trigger the guest memory to be overwritten again. The hypercall page
should only be populated when the *guest* writes the MSR.
With the recent elimination of the hypercall page from Linux Xen
guests, we've suggested that Linux should still set up the hypercall
page early (as it *does* have the side-effect of letting Xen know that
the guest is 64-bit). And then just free the page without ever using
it. We absolutely would not want a save/restore to scribble on that
page again.
I'm absolutely not worried about breaking userspace with such a change
to make the hypercall page MSR only work when !host_initiated. In fact
I think it's probably the right thing to do *anyway*.
If userspace wants to write to guest memory, it can do that anyway; it
doesn't need to ask the *kernel* to do it.
> Side topic, upstream QEMU doesn't even appear to put the MSR at the Hyper-V
> address. It tells the guest that's where the MSR is located, but the config
> passed to KVM still uses the default.
>
> /* Hypercall MSR base address */
> if (hyperv_enabled(cpu)) {
> c->ebx = XEN_HYPERCALL_MSR_HYPERV;
> kvm_xen_init(cs->kvm_state, c->ebx);
> } else {
> c->ebx = XEN_HYPERCALL_MSR;
> }
>
> ...
>
> /* hyperv_enabled() doesn't work yet. */
> uint32_t msr = XEN_HYPERCALL_MSR;
> ret = kvm_xen_init(s, msr);
> if (ret < 0) {
> return ret;
> }
>
Those two happen in reverse chronological order, don't they? And in the
lower one the comment tells you that hyperv_enabled() doesn't work yet.
When the higher one is called later, it calls kvm_xen_init() *again* to
put the MSR in the right place.
It could be prettier, but I don't think it's broken, is it?
> Userspace breakage aside, disallowng host writes would fix the immediate issue,
> and I think would mitigate all concerns with putting the host at risk. But it's
> not enough to actually make an overlapping MSR index work. E.g. if the MSR is
> passed through to the guest, the write will go through to the hardware MSR, unless
> the WRMSR happens to be emulated.
>
> I really don't want to broadly support redirecting any MSR, because to truly go
> down that path we'd need to deal with x2APIC, EFER, and other MSRs that have
> special treatment and meaning.
>
> While KVM's stance is usually that a misconfigured vCPU model is userspace's
> problem, in this case I don't see any value in letting userspace be stupid. It
> can't work generally, it creates unique ABI for KVM_SET_MSRS, and unless there's
> a crazy use case I'm overlooking, there's no sane reason for userspace to put the
> index in outside of the synthetic range (whereas defining seemingly nonsensical
> CPUID feature bits is useful for testing purposes, implementing support in
> userspace, etc).
Right, I think we should do *both*. Blocking host writes solves the
issue of locking problems with the hypercall page setup. All it would
take for that issue to recur is for us (or Microsoft) to invent a new
MSR in the synthetic range which is also written on vCPU init/reset.
And then the sanity check on where the VMM puts the Xen MSR doesn't
save us.
But yes, we should *also* do that sanity check.
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)
Powered by blists - more mailing lists