[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEiQitCsXq9XSBcZ@google.com>
Date: Tue, 10 Jun 2025 13:07:38 -0700
From: Brian Norris <briannorris@...omium.org>
To: Aleksandrs Vinarskis <alex.vinarskis@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Tsai Sung-Fu <danielsftsai@...gle.com>,
Douglas Anderson <dianders@...omium.org>,
linux-kernel@...r.kernel.org, Johan Hovold <johan@...nel.org>
Subject: Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU
hotplug
Hi Alex,
On Mon, Jun 09, 2025 at 08:19:58PM +0200, Aleksandrs Vinarskis wrote:
> On Mon, 9 Jun 2025 at 19:13, Brian Norris <briannorris@...omium.org> wrote:
> > On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> > > It appears that this commit introduces a critical bug observed on at least
> > > some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> > > unusable.
For my reference, are these laptops represented by the
arch/arm64/boot/dts/qcom/x1e80100.dtsi family of device trees? I'm just
trying to reason through what sorts of driver(s) are in use here, in
case there's something I'm overlooking, as I don't have the laptop in
question to test.
> > > With this change in place, after successful suspend the device either:
> > > 1. Cannot wake up at all. Screen stays black, even though PM has existed
> > > suspend (observed by external LEDs controlled by PM)
> > >
> > > 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> > > errors in dmesg. System still exhibits errors eg. UI icons are not properly
> > > loaded, WiFi does not (always) connect.
FYI, my assumption here based on the log snippets and the patch in
question is that "only" the NVMe driver's IRQs are getting b0rked by my
change. I could imagine that would produce the above symptoms in most
laptop configurations, because failing disk I/O will likely block most
wakeup-related activity, and cause all sorts of UI and system daemon
(e.g., WiFi supplicant) misbehavior.
> > I'm sorry to hear this has caused regressions. I don't yet know why your
> > particular problems have occurred, but I did notice last week that there
> > were some issues with the patch in question. I wrote a patch which I'll
> > append, and have started (but not completely finished) testing it.
> > Perhaps you could try it out and let me know how it goes?
>
> Hi Brian,
>
> I have tested your attached patch in addition to the original one, and
> unfortunately it did not resolve the problem on either of the two
> laptops: neither managed to wake up, just like before.
> Will be happy to promptly test other proposed solutions.
Thanks for the testing. I've found a few problems with my proposed
patch, and I've come up with the appended alternative that solves them.
Could you give it a try?
Also, if it's not too much trouble (and especially if my patch still
doesn't help you), could you also provide a more complete kernel log and
kernel .config file? (Attachment is fine with me. Or a direct email, if
somehow the lists don't like it.) It's possible that would give me more
hints as to what's going wrong for you.
Thanks,
Brian
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b0e0a7332993..3e873c5ce623 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -205,12 +205,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
void irq_startup_managed(struct irq_desc *desc)
{
+ struct irq_data *d = irq_desc_get_irq_data(desc);
+
/*
* Only start it up when the disable depth is 1, so that a disable,
* hotunplug, hotplug sequence does not end up enabling it during
* hotplug unconditionally.
*/
desc->depth--;
+ irqd_clr_managed_shutdown(d);
if (!desc->depth)
irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
}
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index f07529ae4895..755346ea9819 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -210,13 +210,6 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
!irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
return;
- /*
- * Don't restore suspended interrupts here when a system comes back
- * from S3. They are reenabled via resume_device_irqs().
- */
- if (desc->istate & IRQS_SUSPENDED)
- return;
-
if (irqd_is_managed_and_shutdown(data))
irq_startup_managed(desc);
Powered by blists - more mailing lists