[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1bd1d36e976b0f64e88e7ab44dfcc049718703d1.camel@infradead.org>
Date: Wed, 11 Jan 2023 10:26:55 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH] Documentation: kvm: clarify SRCU locking order
On Thu, 2023-01-05 at 22:38 +0000, Sean Christopherson wrote:
>
> > > +- kvm->lock is taken inside kvm->srcu, therefore
> >
> > Prior to the recent Xen change, is this actually true?
>
> I was thinking of a different change, but v5.19 is still kinda recent, so I'll
> count it. Better to be lucky than good :-)
>
> Commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") introduced
> the only case I can find where kvm->srcu is taken inside kvm->lock.
>
> > There are many instances where kvm->srcu is taken inside kvm->lock, but I
> > can't find any existing cases where the reverse is true. Logically, it makes
> > sense to take kvm->lock first since kvm->srcu can be taken deep in helpers,
> > e.g. for accessing guest memory. It's also more consistent to take kvm->lock
> > first since kvm->srcu is taken inside vcpu->mutex, and vcpu->mutex is taken
> > inside kvm->lock.
> >
> > Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se,
> > but it's going to result in a weird set of rules because synchronize_scru() can,
> > and is, called while holding a variety of other locks.
> >
> > In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the
> > real bug.
>
> I'm doubing down on this. Taking kvm->srcu outside of kvm->lock is all kinds of
> sketchy, and likely indicates a larger problem. The aformentioned commit that
> introduced the problematic kvm->srcu vs. kvm->lock also blatantly violates ordering
> between kvm->lock and vcpu->mutex. Details in the link[*].
>
> The vast majority of flows that take kvm->srcu will already hold a lock of some
> kind, otherwise the task can't safely deference any VM/vCPU/device data and thus
> has no reason to acquire kvm->srcu. E.g. taking kvm->srcu to read guest memory
> is nonsensical without a stable guest physical address to work with.
>
> There are exceptions, e.g. evtchn_set_fn() and maybe some ioctls(), but in most
> cases, taking kvm->lock inside kvm->srcu is just asking for problems.
>
> [*] https://lore.kernel.org/all/Y7dN0Negds7XUbvI@google.com
So...
If we apply the patch below, then the Xen code never takes kvm->lock
inside kvm->srcu. (Never takes kvm->lock at all, in fact; I'm just
rereading the manual search/replace to make sure the change is valid in
all cases.)
It does take kvm->scru *without* holding kvm->lock, just not "outside"
it.
I think we can probably revert commit a79b53aaaa ("KVM: x86: fix
deadlock for KVM_XEN_EVTCHN_RESET") after doing this too?
From: David Woodhouse <dwmw@...zon.co.uk>
Date: Wed, 11 Jan 2023 10:04:38 +0000
Subject: [PATCH] KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
In commit 14243b387137a ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:
/*
* For the irqfd workqueue, using the main kvm->lock mutex is
* fine since this function is invoked from kvm_set_irq() with
* no other lock held, no srcu. In future if it will be called
* directly from a vCPU thread (e.g. on hypercall for an IPI)
* then it may need to switch to using a leaf-node mutex for
* serializing the shared_info mapping.
*/
mutex_lock(&kvm->lock);
In commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.
Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.
Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/xen.c | 65 +++++++++++++++------------------
2 files changed, 30 insertions(+), 36 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2f5bf581d00a..4ef0143fa682 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1115,6 +1115,7 @@ struct msr_bitmap_range {
/* Xen emulation context */
struct kvm_xen {
+ struct mutex xen_lock;
u32 xen_version;
bool long_mode;
bool runstate_update_flag;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index c444948ab1ac..713241808a3d 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -604,26 +604,26 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) {
r = -EINVAL;
} else {
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
kvm->arch.xen.long_mode = !!data->u.long_mode;
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
r = 0;
}
break;
case KVM_XEN_ATTR_TYPE_SHARED_INFO:
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
break;
case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
if (data->u.vector && data->u.vector < 0x10)
r = -EINVAL;
else {
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
kvm->arch.xen.upcall_vector = data->u.vector;
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
r = 0;
}
break;
@@ -633,9 +633,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
break;
case KVM_XEN_ATTR_TYPE_XEN_VERSION:
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
kvm->arch.xen.xen_version = data->u.xen_version;
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
r = 0;
break;
@@ -644,9 +644,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
r = -EOPNOTSUPP;
break;
}
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
kvm->arch.xen.runstate_update_flag = !!data->u.runstate_update_flag;
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
r = 0;
break;
@@ -661,7 +661,7 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
{
int r = -ENOENT;
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
switch (data->type) {
case KVM_XEN_ATTR_TYPE_LONG_MODE:
@@ -700,7 +700,7 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
break;
}
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
return r;
}
@@ -708,7 +708,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
{
int idx, r = -ENOENT;
- mutex_lock(&vcpu->kvm->lock);
+ mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
idx = srcu_read_lock(&vcpu->kvm->srcu);
switch (data->type) {
@@ -936,7 +936,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
}
srcu_read_unlock(&vcpu->kvm->srcu, idx);
- mutex_unlock(&vcpu->kvm->lock);
+ mutex_unlock(&vcpu->kvm->arch.xen.xen_lock);
return r;
}
@@ -944,7 +944,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
{
int r = -ENOENT;
- mutex_lock(&vcpu->kvm->lock);
+ mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
switch (data->type) {
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO:
@@ -1027,7 +1027,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
break;
}
- mutex_unlock(&vcpu->kvm->lock);
+ mutex_unlock(&vcpu->kvm->arch.xen.xen_lock);
return r;
}
@@ -1120,7 +1120,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
xhc->blob_size_32 || xhc->blob_size_64))
return -EINVAL;
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
if (xhc->msr && !kvm->arch.xen_hvm_config.msr)
static_branch_inc(&kvm_xen_enabled.key);
@@ -1129,7 +1129,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
memcpy(&kvm->arch.xen_hvm_config, xhc, sizeof(*xhc));
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
return 0;
}
@@ -1672,15 +1672,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm)
mm_borrowed = true;
}
- /*
- * For the irqfd workqueue, using the main kvm->lock mutex is
- * fine since this function is invoked from kvm_set_irq() with
- * no other lock held, no srcu. In future if it will be called
- * directly from a vCPU thread (e.g. on hypercall for an IPI)
- * then it may need to switch to using a leaf-node mutex for
- * serializing the shared_info mapping.
- */
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
/*
* It is theoretically possible for the page to be unmapped
@@ -1709,7 +1701,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm)
srcu_read_unlock(&kvm->srcu, idx);
} while(!rc);
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
if (mm_borrowed)
kthread_unuse_mm(kvm->mm);
@@ -1825,7 +1817,7 @@ static int kvm_xen_eventfd_update(struct kvm *kvm,
int ret;
/* Protect writes to evtchnfd as well as the idr lookup. */
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port);
ret = -ENOENT;
@@ -1856,7 +1848,7 @@ static int kvm_xen_eventfd_update(struct kvm *kvm,
}
ret = 0;
out_unlock:
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
return ret;
}
@@ -1919,10 +1911,10 @@ static int kvm_xen_eventfd_assign(struct kvm *kvm,
evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority;
}
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
ret = idr_alloc(&kvm->arch.xen.evtchn_ports, evtchnfd, port, port + 1,
GFP_KERNEL);
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
if (ret >= 0)
return 0;
@@ -1940,9 +1932,9 @@ static int kvm_xen_eventfd_deassign(struct kvm *kvm, u32 port)
{
struct evtchnfd *evtchnfd;
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
evtchnfd = idr_remove(&kvm->arch.xen.evtchn_ports, port);
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
if (!evtchnfd)
return -ENOENT;
@@ -1959,7 +1951,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm)
struct evtchnfd *evtchnfd;
int i;
- mutex_lock(&kvm->lock);
+ mutex_lock(&kvm->arch.xen.xen_lock);
idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) {
idr_remove(&kvm->arch.xen.evtchn_ports, evtchnfd->send_port);
synchronize_srcu(&kvm->srcu);
@@ -1967,7 +1959,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm)
eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx);
kfree(evtchnfd);
}
- mutex_unlock(&kvm->lock);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
return 0;
}
@@ -2059,6 +2051,7 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
void kvm_xen_init_vm(struct kvm *kvm)
{
+ mutex_init(&kvm->arch.xen.xen_lock);
idr_init(&kvm->arch.xen.evtchn_ports);
kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm, NULL, KVM_HOST_USES_PFN);
}
--
2.35.3
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists