lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250701163558.2588435-1-liangyan.peng@bytedance.com>
Date: Wed,  2 Jul 2025 00:35:58 +0800
From: Liangyan <liangyan.peng@...edance.com>
To: tglx@...utronix.de
Cc: linux-kernel@...r.kernel.org,
	Liangyan <liangyan.peng@...edance.com>,
	Yicong Shen <shenyicong.1023@...edance.com>
Subject: [RFC] genirq: Fix lockup in handle_edge_irq

Yicong reported a softlockup in guest vm triggered by setting NIC IRQ
affinity in irqbalance service.

When a NIC IRQ affinity is changed from cpu 0 to cpu 1 and cpu 0 is
handling the first interrupt of this IRQ in handle_edge_irq, the second
interrupt is activated and handled in cpu 1 which sets IRQS_PENDING flag,
cpu 0 will invoke handle_irq_event again after finish the first interrupt.
If the interval between two interrupts is smaller than the latency of
handling one interrupt in the loop of handle_edge_irq (i.e., unmask_irq +
handle_irq_event), cpu 0 may repeat to invoke handle_irq_event and not
exit handle_edge_irq which causes softlockup at last(hardlockup is
not enabled in guest vm).

In our online guest vm, we have some heavy network traffic business,
the number of NIC interrupt is more that 1000 per second, the NIC
mask/unmask_irq will trap to host and consume more than 1ms, this
softlockup is easy to reproduce. By bpftrace, we can see cpu 0 invokes
handle_irq_event more than 5000 times in handle_edge_irq when
softlockup occurs.

To fix this, we can limit the repeat times of calling handle_irq_event.

       cpu 0                                        cpu 1

  handle_edge_irq
    spin_lock
    do {
        unmask_irq if IRQS_PENDING
                                                handle_edge_irq
        handle_irq_event
          istate &= ~IRQS_PENDING
          spin_unlock
                                                  spin_lock
                                                  istate |= IRQS_PENDING
          handle_irq_event_percpu                 mask_ack_irq
                                                  spin_unlock
          spin_lock
      } while(istate & IRQS_PENDING)
      spin_unlock

The softlockup traces look something like this:
-----
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L
Hardware name: ByteDance Inc. OpenStack Nova, BIOS
RIP: 0010:__do_softirq+0x78/0x2ac
RSP: 0018:ffffa02a00134f98 EFLAGS: 00000246
RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: 00000000000000c1 RSI: ffffffff9e801040 RDI: 0000000000000016
RBP: ffffa02a000c7dd8 R08: 000002ea2320b76b R09: 7fffffffffffffff
R10: 000002ea3a1c0080 R11: 00000000002fefff R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000080
FS:  0000000000000000(0000) GS:ffff89323e840000(0000)
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2e5957c000 CR3: 0000000167a9a005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 __irq_exit_rcu+0xb9/0xf0
 sysvec_apic_timer_interrupt+0x72/0x90
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:cpuidle_enter_state+0xd2/0x400
RSP: 0018:ffffa02a000c7e80 EFLAGS: 00000202
RAX: ffff89323e870bc0 RBX: 0000000000000001 RCX: 00000000ffffffff
RDX: 0000000000000016 RSI: ffffffff9e801040 RDI: 0000000000000000
RBP: ffff89323e87c700 R08: 000002ea22ebdf87 R09: 0000000000000018
R10: 000000000000010d R11: 000000000000020a R12: ffffffff9dab58e0
R13: 000002ea22ebdf87 R14: 0000000000000001 R15: 0000000000000000
 cpuidle_enter+0x29/0x40
 cpuidle_idle_call+0xfa/0x160
 do_idle+0x7b/0xe0
 cpu_startup_entry+0x19/0x20
 start_secondary+0x116/0x140
 secondary_startup_64_no_verify+0xe5/0xeb
 </TASK>

Signed-off-by: Liangyan <liangyan.peng@...edance.com>
Reported-by: Yicong Shen <shenyicong.1023@...edance.com>
---
 kernel/irq/chip.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2b274007e8ba..9f5c50e75e6b 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -764,6 +764,8 @@ EXPORT_SYMBOL_GPL(handle_fasteoi_nmi);
  */
 void handle_edge_irq(struct irq_desc *desc)
 {
+	bool need_unmask = false;
+
 	guard(raw_spinlock)(&desc->lock);
 
 	if (!irq_can_handle(desc)) {
@@ -791,12 +793,16 @@ void handle_edge_irq(struct irq_desc *desc)
 		if (unlikely(desc->istate & IRQS_PENDING)) {
 			if (!irqd_irq_disabled(&desc->irq_data) &&
 			    irqd_irq_masked(&desc->irq_data))
-				unmask_irq(desc);
+				need_unmask = true;
 		}
 
 		handle_irq_event(desc);
 
 	} while ((desc->istate & IRQS_PENDING) && !irqd_irq_disabled(&desc->irq_data));
+
+	if (need_unmask && !irqd_irq_disabled(&desc->irq_data) &&
+	    irqd_irq_masked(&desc->irq_data))
+		unmask_irq(desc);
 }
 EXPORT_SYMBOL(handle_edge_irq);
 
-- 
2.20.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ