lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEcWTM3Y1roOf4Ph@google.com>
Date: Mon, 9 Jun 2025 10:13:48 -0700
From: Brian Norris <briannorris@...omium.org>
To: Aleksandrs Vinarskis <alex.vinarskis@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	Tsai Sung-Fu <danielsftsai@...gle.com>,
	Douglas Anderson <dianders@...omium.org>,
	linux-kernel@...r.kernel.org, Johan Hovold <johan@...nel.org>
Subject: Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU
 hotplug

Hi Alex,

On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> On 5/14/25 22:13, Brian Norris wrote:
> > Affinity-managed IRQs may be shut down and restarted during CPU
> > hotunplug/plug, and the IRQ may be left in an unexpected state.
> > Specifically:

[...]

> It appears that this commit introduces a critical bug observed on at least
> some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> unusable.
> 
> With this change in place, after successful suspend the device either:
> 1. Cannot wake up at all. Screen stays black, even though PM has existed
> suspend (observed by external LEDs controlled by PM)
> 
> 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> errors in dmesg. System still exhibits errors eg. UI icons are not properly
> loaded, WiFi does not (always) connect.

I'm sorry to hear this has caused regressions. I don't yet know why your
particular problems have occurred, but I did notice last week that there
were some issues with the patch in question. I wrote a patch which I'll
append, and have started (but not completely finished) testing it.
Perhaps you could try it out and let me know how it goes?

> Is it possible to have this addressed/patched up/reverted before 6.16-rc1
> goes live and introduces the regression?
> It also appears this series was selected for backporting to 6.6, 6.12, 6.14,
> 6.15:  perhaps this should be postponed/aborted until better solution is
> found?

Regarding stable backports: yes, please. It looks like Johan requested
holding this back on stable here:

https://lore.kernel.org/all/aELf3QmuEJOlR7Dv@hovoldconsulting.com/

Hopefully we can figure out a mainline solution promptly enough, but
revert is also OK if it comes down to it.

Below is a patch I'm working with so far. I can submit it as a separate
patch if that helps you.

Brian

---

Subject: [PATCH] genirq: Rebalance managed interrupts across multi-CPU hotplug

Commit 788019eb559f ("genirq: Retain disable depth for managed
interrupts across CPU hotplug") intended to only decrement the disable
depth once per managed shutdown, but instead it decrements for each CPU
hotplug in the affinity mask, until its depth reaches a point where it
finally gets re-started.

For example, consider:

1. Interrupt is affine to CPU {M,N}
2. disable_irq() -> depth is 1
3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
4. CPU N goes offline -> irq_shutdown() / depth is 2
5. CPU N goes online
    -> irq_restore_affinity_of_irq()
       -> irqd_is_managed_and_shutdown()==true
          -> irq_startup_managed() -> depth is 1
6. CPU M goes online
    -> irq_restore_affinity_of_irq()
       -> irqd_is_managed_and_shutdown()==true
          -> irq_startup_managed() -> depth is 0
          *** BUG: driver expects the interrupt is still disabled ***
             -> irq_startup() -> irqd_clr_managed_shutdown()
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning

We should clear the managed-shutdown flag at step 6, so that further
hotplugs don't cause further imbalance.

Fixes: commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Signed-off-by: Brian Norris <briannorris@...omium.org>
---
 kernel/irq/chip.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b0e0a7332993..1af5fe14f3e0 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -175,8 +175,6 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 	if (!irqd_affinity_is_managed(d))
 		return IRQ_STARTUP_NORMAL;
 
-	irqd_clr_managed_shutdown(d);
-
 	if (!cpumask_intersects(aff, cpu_online_mask)) {
 		/*
 		 * Catch code which fiddles with enable_irq() on a managed
@@ -205,12 +203,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 
 void irq_startup_managed(struct irq_desc *desc)
 {
+	struct irq_data *d = irq_desc_get_irq_data(desc);
+
 	/*
 	 * Only start it up when the disable depth is 1, so that a disable,
 	 * hotunplug, hotplug sequence does not end up enabling it during
 	 * hotplug unconditionally.
 	 */
 	desc->depth--;
+	irqd_clr_managed_shutdown(d);
 	if (!desc->depth)
 		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
 }
-- 
2.50.0.rc0.642.g800a2b2222-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ