[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250718184935.232983339@linutronix.de>
Date: Fri, 18 Jul 2025 20:54:04 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Liangyan <liangyan.peng@...edance.com>,
Yicong Shen <shenyicong.1023@...edance.com>,
Jiri Slaby <jirislaby@...nel.org>
Subject: [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq()
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.
If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:
CPU 0 (previous target) CPU 1 (new target)
handle_edge_irq()
repeat:
handle_event() handle_edge_irq()
if (INPROGESS) {
set(PENDING);
mask();
return;
}
if (PENDING) {
clear(PENDING);
unmask();
goto repeat;
}
The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.
The following series is addressing this by making the new target CPU wait
for the handler to complete on CPU1 and thereby completing the software
migration.
A draft combo patch of this has been tested by Liangyan:
https://lore.kernel.org/all/87o6u0rpaa.ffs@tglx
The series splits up the draft patch and has proper changelogs.
Thanks,
tglx
---
chip.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
internals.h | 6 ++---
pm.c | 16 +++++---------
spurious.c | 37 --------------------------------
4 files changed, 69 insertions(+), 58 deletions(-)
Powered by blists - more mailing lists