[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <42513cb3-3c2e-4aa8-b748-23b6656a5096@redhat.com>
Date: Mon, 22 Dec 2025 15:09:13 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Ankit Soni <Ankit.Soni@....com>, Sean Christopherson <seanjc@...gle.com>,
Marc Zyngier <maz@...nel.org>
Cc: Oliver Upton <oliver.upton@...ux.dev>, Joerg Roedel <joro@...tes.org>,
David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
kvm@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
Sairaj Kodilkar <sarunkod@....com>, Vasant Hegde <vasant.hegde@....com>,
Maxim Levitsky <mlevitsk@...hat.com>,
Joao Martins <joao.m.martins@...cle.com>,
Francesco Lavra <francescolavra.fl@...il.com>,
David Matlack <dmatlack@...gle.com>, Naveen Rao <Naveen.Rao@....com>
Subject: possible deadlock due to irq_set_thread_affinity() calling into the
scheduler (was Re: [PATCH v3 38/62] KVM: SVM: Take and hold ir_list_lock
across IRTE updates in IOMMU)
On 12/22/25 10:16, Ankit Soni wrote:
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.19.0-rc2 #20 Tainted: G E
> ------------------------------------------------------
> CPU 58/KVM/28597 is trying to acquire lock:
> ff12c47d4b1f34c0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
>
> but task is already holding lock:
> ff12c49b28552110 (&svm->ir_list_lock){....}-{2:2}, at: avic_pi_update_irte+0x147/0x270 [kvm_amd]
>
> which lock already depends on the new lock.
>
> Chain exists of:
> &irq_desc_lock_class --> &rq->__lock --> &svm->ir_list_lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&svm->ir_list_lock);
> lock(&rq->__lock);
> lock(&svm->ir_list_lock);
> lock(&irq_desc_lock_class);
>
> *** DEADLOCK ***
>
> So lockdep sees:
>
> &irq_desc_lock_class -> &rq->__lock -> &svm->ir_list_lock
>
> while avic_pi_update_irte() currently holds svm->ir_list_lock and then
> takes irq_desc_lock via irq_set_vcpu_affinity(), which creates the
> potential inversion.
>
> - Is this lockdep warning expected/benign in this code path, or does it
> indicate a real potential deadlock between svm->ir_list_lock and
> irq_desc_lock with AVIC + irq_bypass + VFIO?
I'd treat it as a potential (if unlikely) deadlock:
(a) irq_set_thread_affinity triggers the scheduler via wake_up_process,
while irq_desc->lock is taken
(b) the scheduler calls into KVM with rq_lock taken, and KVM uses
ir_list_lock within __avic_vcpu_load/__avic_vcpu_put
(c) KVM wants to block scheduling for a while and uses ir_list_lock for
that purpose, but then takes irq_set_vcpu_affinity takes irq_desc->lock.
I don't think there's an alternative choice of lock for (c); and there's
no easy way to pull the irq_desc->lock out of the IRQ subsystem--in fact
the stickiness of the situation comes from rq->rq_lock and
irq_desc->lock being both internal and not leaf.
Of the three, the most sketchy is (a); notably, __setup_irq() calls
wake_up_process outside desc->lock. Therefore I'd like so much to treat
it as a kernel/irq/ bug; and the simplest (perhaps too simple...) fix is
to drop the wake_up_process(). The only cost is extra latency on the
next interrupt after an affinity change.
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8b1b4c8a4f54..fc135bd079a4 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -189,14 +189,10 @@ static void irq_set_thread_affinity(struct irq_desc *desc)
struct irqaction *action;
for_each_action_of_desc(desc, action) {
- if (action->thread) {
+ if (action->thread)
set_bit(IRQTF_AFFINITY, &action->thread_flags);
- wake_up_process(action->thread);
- }
- if (action->secondary && action->secondary->thread) {
+ if (action->secondary && action->secondary->thread)
set_bit(IRQTF_AFFINITY, &action->secondary->thread_flags);
- wake_up_process(action->secondary->thread);
- }
}
}
Marc, what do you think?
Paolo
Powered by blists - more mailing lists