[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874iovu742.ffs@tglx>
Date: Thu, 08 Jan 2026 22:28:29 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: Paolo Bonzini <pbonzini@...hat.com>, Ankit Soni <Ankit.Soni@....com>,
Sean Christopherson <seanjc@...gle.com>, Marc Zyngier <maz@...nel.org>
Cc: Oliver Upton <oliver.upton@...ux.dev>, Joerg Roedel <joro@...tes.org>,
David Woodhouse <dwmw2@...radead.org>, Lu Baolu
<baolu.lu@...ux.intel.com>, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev, kvm@...r.kernel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, Sairaj Kodilkar <sarunkod@....com>, Vasant
Hegde <vasant.hegde@....com>, Maxim Levitsky <mlevitsk@...hat.com>, Joao
Martins <joao.m.martins@...cle.com>, Francesco Lavra
<francescolavra.fl@...il.com>, David Matlack <dmatlack@...gle.com>, Naveen
Rao <Naveen.Rao@....com>, Crystal Wood <crwood@...hat.com>
Subject: Re: possible deadlock due to irq_set_thread_affinity() calling into
the scheduler (was Re: [PATCH v3 38/62] KVM: SVM: Take and hold
ir_list_lock across IRTE updates in IOMMU)
On Mon, Dec 22 2025 at 15:09, Paolo Bonzini wrote:
> On 12/22/25 10:16, Ankit Soni wrote:
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.19.0-rc2 #20 Tainted: G E
>> ------------------------------------------------------
>> CPU 58/KVM/28597 is trying to acquire lock:
>> ff12c47d4b1f34c0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
>>
>> but task is already holding lock:
>> ff12c49b28552110 (&svm->ir_list_lock){....}-{2:2}, at: avic_pi_update_irte+0x147/0x270 [kvm_amd]
>>
>> which lock already depends on the new lock.
>>
>> Chain exists of:
>> &irq_desc_lock_class --> &rq->__lock --> &svm->ir_list_lock
>>
>> Possible unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(&svm->ir_list_lock);
>> lock(&rq->__lock);
>> lock(&svm->ir_list_lock);
>> lock(&irq_desc_lock_class);
>>
>> *** DEADLOCK ***
>>
>> So lockdep sees:
>>
>> &irq_desc_lock_class -> &rq->__lock -> &svm->ir_list_lock
>>
>> while avic_pi_update_irte() currently holds svm->ir_list_lock and then
>> takes irq_desc_lock via irq_set_vcpu_affinity(), which creates the
>> potential inversion.
>>
>> - Is this lockdep warning expected/benign in this code path, or does it
>> indicate a real potential deadlock between svm->ir_list_lock and
>> irq_desc_lock with AVIC + irq_bypass + VFIO?
>
> I'd treat it as a potential (if unlikely) deadlock:
>
> (a) irq_set_thread_affinity triggers the scheduler via wake_up_process,
> while irq_desc->lock is taken
>
> (b) the scheduler calls into KVM with rq_lock taken, and KVM uses
> ir_list_lock within __avic_vcpu_load/__avic_vcpu_put
>
> (c) KVM wants to block scheduling for a while and uses ir_list_lock for
> that purpose, but then takes irq_set_vcpu_affinity takes irq_desc->lock.
>
> I don't think there's an alternative choice of lock for (c); and there's
> no easy way to pull the irq_desc->lock out of the IRQ subsystem--in fact
Don't even think about that.
> the stickiness of the situation comes from rq->rq_lock and
> irq_desc->lock being both internal and not leaf.
>
> Of the three, the most sketchy is (a); notably, __setup_irq() calls
> wake_up_process outside desc->lock. Therefore I'd like so much to treat
> it as a kernel/irq/ bug; and the simplest (perhaps too simple...) fix is
It's not more sketchy than VIRT assuming that it can do what it wants
under rq->lock. :)
> to drop the wake_up_process(). The only cost is extra latency on the
> next interrupt after an affinity change.
The real problematic cost is that in an isolation scenario the wakeup
happens at the next interrupt which might be far into the isolated
phase. That's why the wakeup is there. See:
c99303a2d2a2 ("genirq: Wake interrupt threads immediately when changing affinity")
Obviously you did not even bother to look that up otherwise you would
have CC'ed Crystal and asked her to take a look...
Thanks,
tglx
Powered by blists - more mailing lists