[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f3608ef2-1d9f-406c-92f3-fa69486e1644@google.com>
Date: Thu, 3 Jul 2025 23:31:23 +0800
From: Liangyan <liangyan.peng@...edance.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Yicong Shen
<shenyicong.1023@...edance.com>, ziqianlu@...edance.com,
songmuchun@...edance.com, yuanzhu@...edance.com
Subject: Re: [External] Re: [RFC] genirq: Fix lockup in handle_edge_irq
Hello Thomas,
We have this softlockup issue in guest vm, so the related IRQ is from
virtio-net tx queue, the interrupt controller is virt pci msix
controller, related components have pci_msi_controller, virtio_pci,
virtio_net and qemu.
And according to qemu msix.c source code, when irq is unmasked, it will
fire new one if the msix pending bit is set.
Seems that for msi-x controller, it will not lose interrupt during
unmask period.
For this virt MSIX controller, do you have some suggestion? Thanks.
Regards,
Liangyan
On 2025/7/2 21:17, Thomas Gleixner wrote:
> On Wed, Jul 02 2025 at 00:35, Liangyan wrote:
>> void handle_edge_irq(struct irq_desc *desc)
>> {
>> + bool need_unmask = false;
>> +
>> guard(raw_spinlock)(&desc->lock);
>>
>> if (!irq_can_handle(desc)) {
>> @@ -791,12 +793,16 @@ void handle_edge_irq(struct irq_desc *desc)
>> if (unlikely(desc->istate & IRQS_PENDING)) {
>> if (!irqd_irq_disabled(&desc->irq_data) &&
>> irqd_irq_masked(&desc->irq_data))
>> - unmask_irq(desc);
>> + need_unmask = true;
>> }
>>
>> handle_irq_event(desc);
>>
>> } while ((desc->istate & IRQS_PENDING) && !irqd_irq_disabled(&desc->irq_data));
>> +
>> + if (need_unmask && !irqd_irq_disabled(&desc->irq_data) &&
>> + irqd_irq_masked(&desc->irq_data))
>> + unmask_irq(desc);
>
> This might work in your setup by some definition of "works", but it
> breaks the semantics of this handler because of this:
>
> device interrupt CPU0 CPU1
> handle_edge_irq()
> set(INPROGRESS);
>
> do {
> handle_event();
>
> device interrupt
> handle_edge_irq()
> if (INPROGRESS) {
> set(PENDING);
> mask();
> return;
> }
>
> ...
> if (PENDING) {
> need_unmask = true;
> }
> handle_event();
>
> device interrupt << possible FAIL
>
> because there are enough edge type interrupt controllers out there which
> lose an edge when the line is masked at the interrupt controller
> level. As edge type interrupts are fire and forget from the device
> perspective, the interrupt is not retriggered when unmasking later.
>
> That's the reason why this handler is written the way it is and this
> cannot be changed for obvious reasons.
>
> So no, this is not going to happen.
>
> The only possible solution for this is to analyze all interrupt
> controllers, which are involved in the delivery chain, and establish
> whether they are affected by the above problem. If not, then that
> particular delivery chain combination of interrupt controllers can be
> changed to use a different flow handler along with a profound
> explanation why this is correct under all circumstances.
>
> As you failed to provide any information about the involved controllers,
> I cannot even give any hint about a possible solution.
>
> Thanks,
>
> tglx
>
>
Powered by blists - more mailing lists