[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <qluelhof4piilyqbyanflp3qdljxak73kt2yvahkaby6vmyzzu@qgvqej7kdio5>
Date: Mon, 1 Dec 2025 09:04:07 -0800
From: Breno Leitao <leitao@...ian.org>
To: Petr Mladek <pmladek@...e.com>
Cc: john.ogness@...utronix.de, linux@...linux.org.uk, paulmck@...nel.org,
usamaarif642@...il.com, leo.yan@....com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com, rmikey@...a.com
Subject: Re: CSD lockup during kexec due to unbounded busy-wait in
pl011_console_write_atomic (arm64)
Hello Petr,
On Fri, Nov 28, 2025 at 05:08:17PM +0100, Petr Mladek wrote:
> On Tue 2025-11-25 08:02:16, Breno Leitao wrote:
>
> I do _not_ think that the CPU was waiting in pl011_console_write_atomic() in the
> the following cycle the entire 11 secs:
>
> while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> cpu_relax();
>
> A more likely scenario was that pl011_console_write_atomic() was
> called several times during this period because there were more
> pending messages.
Probably. Most of the messages are coming from CPU being powered off:
[ 44.119433] psci: CPU1 killed (polled 0 ms)
[ 44.146057] psci: CPU2 killed (polled 0 ms)
[ 44.182058] psci: CPU3 killed (polled 0 ms)
[ 44.218031] psci: CPU4 killed (polled 0 ms)
[ 44.252962] psci: CPU5 killed (polled 0 ms)
[ 44.276939] psci: CPU6 killed (polled 0 ms)
[ 44.296152] psci: CPU7 killed (polled 1 ms)
....
And this only happens on "large" machines, thus, the host is flushing
a lot of messages during kexec turn down time.
> > printk_kthreads_shutdown (kernel/printk/printk.c:?)
>
> But the function seems be called with IRQs enabled. So that it might
> help to restore IRQs after each flushed message.
Agree. This would make the irq-disabled sections much smaller, with
a higher changes of IPIs and NMIs (on arm64 hosts without FEAT_NMI).
> But we could extend the existing commit d5d399efff6577 ("printk/nbcon:
> Release nbcon consoles ownership in atomic flush after each emitted
> record") and restore IRQs after each emitted record.
>
> I wonder if the following patch would help in this scenario.
> It is made on top of "for-next" branch in printk/linux.git.
> But the most important pre-requisite is the above mentioned commit
> in the branch "rework/atomic-flush-hardlockup".
>
> Note that the patch is only compile tested.
I've tested the patch and I don't see the CSD lockups anymore.
Thanks for the quick fix.
> Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu
> Signed-off-by: Petr Mladek <pmladek@...e.com>
Tested-by: Breno Leitao <leitao@...ian.org>
Thanks for all people involved in here. With this last patch (that makes
the irq-disbled section smaller), and kfence not IPIing during kexec
time, I consider this issue closed.
--breno
Powered by blists - more mailing lists