[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMcHhXqq9DHgip3rr0=24Y-LEBq5n4rDrE6AsWyjyBmsS7s+-A@mail.gmail.com>
Date: Mon, 9 Jun 2025 20:19:58 +0200
From: Aleksandrs Vinarskis <alex.vinarskis@...il.com>
To: Brian Norris <briannorris@...omium.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Tsai Sung-Fu <danielsftsai@...gle.com>,
Douglas Anderson <dianders@...omium.org>, linux-kernel@...r.kernel.org,
Johan Hovold <johan@...nel.org>
Subject: Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
On Mon, 9 Jun 2025 at 19:13, Brian Norris <briannorris@...omium.org> wrote:
>
> Hi Alex,
>
> On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> > On 5/14/25 22:13, Brian Norris wrote:
> > > Affinity-managed IRQs may be shut down and restarted during CPU
> > > hotunplug/plug, and the IRQ may be left in an unexpected state.
> > > Specifically:
>
> [...]
>
> > It appears that this commit introduces a critical bug observed on at least
> > some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> > unusable.
> >
> > With this change in place, after successful suspend the device either:
> > 1. Cannot wake up at all. Screen stays black, even though PM has existed
> > suspend (observed by external LEDs controlled by PM)
> >
> > 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> > errors in dmesg. System still exhibits errors eg. UI icons are not properly
> > loaded, WiFi does not (always) connect.
>
> I'm sorry to hear this has caused regressions. I don't yet know why your
> particular problems have occurred, but I did notice last week that there
> were some issues with the patch in question. I wrote a patch which I'll
> append, and have started (but not completely finished) testing it.
> Perhaps you could try it out and let me know how it goes?
Hi Brian,
I have tested your attached patch in addition to the original one, and
unfortunately it did not resolve the problem on either of the two
laptops: neither managed to wake up, just like before.
Will be happy to promptly test other proposed solutions.
Thanks,
Alex
>
> > Is it possible to have this addressed/patched up/reverted before 6.16-rc1
> > goes live and introduces the regression?
> > It also appears this series was selected for backporting to 6.6, 6.12, 6.14,
> > 6.15: perhaps this should be postponed/aborted until better solution is
> > found?
>
> Regarding stable backports: yes, please. It looks like Johan requested
> holding this back on stable here:
>
> https://lore.kernel.org/all/aELf3QmuEJOlR7Dv@hovoldconsulting.com/
>
> Hopefully we can figure out a mainline solution promptly enough, but
> revert is also OK if it comes down to it.
>
> Below is a patch I'm working with so far. I can submit it as a separate
> patch if that helps you.
>
> Brian
>
> ---
>
> Subject: [PATCH] genirq: Rebalance managed interrupts across multi-CPU hotplug
>
> Commit 788019eb559f ("genirq: Retain disable depth for managed
> interrupts across CPU hotplug") intended to only decrement the disable
> depth once per managed shutdown, but instead it decrements for each CPU
> hotplug in the affinity mask, until its depth reaches a point where it
> finally gets re-started.
>
> For example, consider:
>
> 1. Interrupt is affine to CPU {M,N}
> 2. disable_irq() -> depth is 1
> 3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
> 4. CPU N goes offline -> irq_shutdown() / depth is 2
> 5. CPU N goes online
> -> irq_restore_affinity_of_irq()
> -> irqd_is_managed_and_shutdown()==true
> -> irq_startup_managed() -> depth is 1
> 6. CPU M goes online
> -> irq_restore_affinity_of_irq()
> -> irqd_is_managed_and_shutdown()==true
> -> irq_startup_managed() -> depth is 0
> *** BUG: driver expects the interrupt is still disabled ***
> -> irq_startup() -> irqd_clr_managed_shutdown()
> 7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
>
> We should clear the managed-shutdown flag at step 6, so that further
> hotplugs don't cause further imbalance.
>
> Fixes: commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
> Signed-off-by: Brian Norris <briannorris@...omium.org>
> ---
> kernel/irq/chip.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index b0e0a7332993..1af5fe14f3e0 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -175,8 +175,6 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
> if (!irqd_affinity_is_managed(d))
> return IRQ_STARTUP_NORMAL;
>
> - irqd_clr_managed_shutdown(d);
> -
> if (!cpumask_intersects(aff, cpu_online_mask)) {
> /*
> * Catch code which fiddles with enable_irq() on a managed
> @@ -205,12 +203,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
>
> void irq_startup_managed(struct irq_desc *desc)
> {
> + struct irq_data *d = irq_desc_get_irq_data(desc);
> +
> /*
> * Only start it up when the disable depth is 1, so that a disable,
> * hotunplug, hotplug sequence does not end up enabling it during
> * hotplug unconditionally.
> */
> desc->depth--;
> + irqd_clr_managed_shutdown(d);
> if (!desc->depth)
> irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
> }
> --
> 2.50.0.rc0.642.g800a2b2222-goog
>
Powered by blists - more mailing lists