[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEcWTM3Y1roOf4Ph@google.com>
Date: Mon, 9 Jun 2025 10:13:48 -0700
From: Brian Norris <briannorris@...omium.org>
To: Aleksandrs Vinarskis <alex.vinarskis@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Tsai Sung-Fu <danielsftsai@...gle.com>,
Douglas Anderson <dianders@...omium.org>,
linux-kernel@...r.kernel.org, Johan Hovold <johan@...nel.org>
Subject: Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU
hotplug
Hi Alex,
On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> On 5/14/25 22:13, Brian Norris wrote:
> > Affinity-managed IRQs may be shut down and restarted during CPU
> > hotunplug/plug, and the IRQ may be left in an unexpected state.
> > Specifically:
[...]
> It appears that this commit introduces a critical bug observed on at least
> some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> unusable.
>
> With this change in place, after successful suspend the device either:
> 1. Cannot wake up at all. Screen stays black, even though PM has existed
> suspend (observed by external LEDs controlled by PM)
>
> 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> errors in dmesg. System still exhibits errors eg. UI icons are not properly
> loaded, WiFi does not (always) connect.
I'm sorry to hear this has caused regressions. I don't yet know why your
particular problems have occurred, but I did notice last week that there
were some issues with the patch in question. I wrote a patch which I'll
append, and have started (but not completely finished) testing it.
Perhaps you could try it out and let me know how it goes?
> Is it possible to have this addressed/patched up/reverted before 6.16-rc1
> goes live and introduces the regression?
> It also appears this series was selected for backporting to 6.6, 6.12, 6.14,
> 6.15: perhaps this should be postponed/aborted until better solution is
> found?
Regarding stable backports: yes, please. It looks like Johan requested
holding this back on stable here:
https://lore.kernel.org/all/aELf3QmuEJOlR7Dv@hovoldconsulting.com/
Hopefully we can figure out a mainline solution promptly enough, but
revert is also OK if it comes down to it.
Below is a patch I'm working with so far. I can submit it as a separate
patch if that helps you.
Brian
---
Subject: [PATCH] genirq: Rebalance managed interrupts across multi-CPU hotplug
Commit 788019eb559f ("genirq: Retain disable depth for managed
interrupts across CPU hotplug") intended to only decrement the disable
depth once per managed shutdown, but instead it decrements for each CPU
hotplug in the affinity mask, until its depth reaches a point where it
finally gets re-started.
For example, consider:
1. Interrupt is affine to CPU {M,N}
2. disable_irq() -> depth is 1
3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
4. CPU N goes offline -> irq_shutdown() / depth is 2
5. CPU N goes online
-> irq_restore_affinity_of_irq()
-> irqd_is_managed_and_shutdown()==true
-> irq_startup_managed() -> depth is 1
6. CPU M goes online
-> irq_restore_affinity_of_irq()
-> irqd_is_managed_and_shutdown()==true
-> irq_startup_managed() -> depth is 0
*** BUG: driver expects the interrupt is still disabled ***
-> irq_startup() -> irqd_clr_managed_shutdown()
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
We should clear the managed-shutdown flag at step 6, so that further
hotplugs don't cause further imbalance.
Fixes: commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Signed-off-by: Brian Norris <briannorris@...omium.org>
---
kernel/irq/chip.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b0e0a7332993..1af5fe14f3e0 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -175,8 +175,6 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
if (!irqd_affinity_is_managed(d))
return IRQ_STARTUP_NORMAL;
- irqd_clr_managed_shutdown(d);
-
if (!cpumask_intersects(aff, cpu_online_mask)) {
/*
* Catch code which fiddles with enable_irq() on a managed
@@ -205,12 +203,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
void irq_startup_managed(struct irq_desc *desc)
{
+ struct irq_data *d = irq_desc_get_irq_data(desc);
+
/*
* Only start it up when the disable depth is 1, so that a disable,
* hotunplug, hotplug sequence does not end up enabling it during
* hotplug unconditionally.
*/
desc->depth--;
+ irqd_clr_managed_shutdown(d);
if (!desc->depth)
irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
}
--
2.50.0.rc0.642.g800a2b2222-goog
Powered by blists - more mailing lists