[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250701163558.2588435-1-liangyan.peng@bytedance.com>
Date: Wed, 2 Jul 2025 00:35:58 +0800
From: Liangyan <liangyan.peng@...edance.com>
To: tglx@...utronix.de
Cc: linux-kernel@...r.kernel.org,
Liangyan <liangyan.peng@...edance.com>,
Yicong Shen <shenyicong.1023@...edance.com>
Subject: [RFC] genirq: Fix lockup in handle_edge_irq
Yicong reported a softlockup in guest vm triggered by setting NIC IRQ
affinity in irqbalance service.
When a NIC IRQ affinity is changed from cpu 0 to cpu 1 and cpu 0 is
handling the first interrupt of this IRQ in handle_edge_irq, the second
interrupt is activated and handled in cpu 1 which sets IRQS_PENDING flag,
cpu 0 will invoke handle_irq_event again after finish the first interrupt.
If the interval between two interrupts is smaller than the latency of
handling one interrupt in the loop of handle_edge_irq (i.e., unmask_irq +
handle_irq_event), cpu 0 may repeat to invoke handle_irq_event and not
exit handle_edge_irq which causes softlockup at last(hardlockup is
not enabled in guest vm).
In our online guest vm, we have some heavy network traffic business,
the number of NIC interrupt is more that 1000 per second, the NIC
mask/unmask_irq will trap to host and consume more than 1ms, this
softlockup is easy to reproduce. By bpftrace, we can see cpu 0 invokes
handle_irq_event more than 5000 times in handle_edge_irq when
softlockup occurs.
To fix this, we can limit the repeat times of calling handle_irq_event.
cpu 0 cpu 1
handle_edge_irq
spin_lock
do {
unmask_irq if IRQS_PENDING
handle_edge_irq
handle_irq_event
istate &= ~IRQS_PENDING
spin_unlock
spin_lock
istate |= IRQS_PENDING
handle_irq_event_percpu mask_ack_irq
spin_unlock
spin_lock
} while(istate & IRQS_PENDING)
spin_unlock
The softlockup traces look something like this:
-----
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G L
Hardware name: ByteDance Inc. OpenStack Nova, BIOS
RIP: 0010:__do_softirq+0x78/0x2ac
RSP: 0018:ffffa02a00134f98 EFLAGS: 00000246
RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: 00000000000000c1 RSI: ffffffff9e801040 RDI: 0000000000000016
RBP: ffffa02a000c7dd8 R08: 000002ea2320b76b R09: 7fffffffffffffff
R10: 000002ea3a1c0080 R11: 00000000002fefff R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000080
FS: 0000000000000000(0000) GS:ffff89323e840000(0000)
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2e5957c000 CR3: 0000000167a9a005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
__irq_exit_rcu+0xb9/0xf0
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:cpuidle_enter_state+0xd2/0x400
RSP: 0018:ffffa02a000c7e80 EFLAGS: 00000202
RAX: ffff89323e870bc0 RBX: 0000000000000001 RCX: 00000000ffffffff
RDX: 0000000000000016 RSI: ffffffff9e801040 RDI: 0000000000000000
RBP: ffff89323e87c700 R08: 000002ea22ebdf87 R09: 0000000000000018
R10: 000000000000010d R11: 000000000000020a R12: ffffffff9dab58e0
R13: 000002ea22ebdf87 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter+0x29/0x40
cpuidle_idle_call+0xfa/0x160
do_idle+0x7b/0xe0
cpu_startup_entry+0x19/0x20
start_secondary+0x116/0x140
secondary_startup_64_no_verify+0xe5/0xeb
</TASK>
Signed-off-by: Liangyan <liangyan.peng@...edance.com>
Reported-by: Yicong Shen <shenyicong.1023@...edance.com>
---
kernel/irq/chip.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2b274007e8ba..9f5c50e75e6b 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -764,6 +764,8 @@ EXPORT_SYMBOL_GPL(handle_fasteoi_nmi);
*/
void handle_edge_irq(struct irq_desc *desc)
{
+ bool need_unmask = false;
+
guard(raw_spinlock)(&desc->lock);
if (!irq_can_handle(desc)) {
@@ -791,12 +793,16 @@ void handle_edge_irq(struct irq_desc *desc)
if (unlikely(desc->istate & IRQS_PENDING)) {
if (!irqd_irq_disabled(&desc->irq_data) &&
irqd_irq_masked(&desc->irq_data))
- unmask_irq(desc);
+ need_unmask = true;
}
handle_irq_event(desc);
} while ((desc->istate & IRQS_PENDING) && !irqd_irq_disabled(&desc->irq_data));
+
+ if (need_unmask && !irqd_irq_disabled(&desc->irq_data) &&
+ irqd_irq_masked(&desc->irq_data))
+ unmask_irq(desc);
}
EXPORT_SYMBOL(handle_edge_irq);
--
2.20.1
Powered by blists - more mailing lists