[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84y0u95e0j.fsf@jogness.linutronix.de>
Date: Tue, 03 Jun 2025 12:19:32 +0206
From: John Ogness <john.ogness@...utronix.de>
To: "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com>, 'Michael Kelley'
<mhklinux@...look.com>
Cc: "pmladek@...e.com" <pmladek@...e.com>, 'Ryo Takakura'
<ryotkkr98@...il.com>, Russell King <linux@...linux.org.uk>, Greg
Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby
<jirislaby@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-serial@...r.kernel.org"
<linux-serial@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "Toshiyuki Sato (Fujitsu)"
<fj6611ie@...itsu.com>
Subject: RE: Problem with nbcon console and amba-pl011 serial port
Hi Toshiyuki,
On 2025-06-03, "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com> wrote:
>> 4. pr_emerg() has a high logging level, and it effectively steals the console
>> from the "pr/ttyAMA0" task, which I believe is intentional in the nbcon design.
>> Down in pl011_console_write_thread(), the "pr/ttyAMA0" task is doing
>> nbcon_enter_unsafe() and nbcon_exit_unsafe() around each character
>> that it outputs. When pr_emerg() steals the console, nbcon_exit_unsafe()
>> returns 0, so the "for" loop exits. pl011_console_write_thread() then
>> enters a busy "while" loop waiting to reclaim the console. It's doing this
>> busy "while" loop with interrupts disabled, and because of the panic,
>> it never succeeds. Whatever CPU is running "pr/ttyAMA0" is effectively
>> stuck at this point.
>>
>> 5. Meanwhile panic() continues, calling panic_other_cpus_shutdown(). On
>> ARM64, other CPUs are stopped by sending them an IPI. Each CPU receives
>> the IPI and calls the PSCI function to stop itself. But the CPU running
>> "pr/ttyAMA0" is looping forever with interrupts disabled, so it never
>> processes the IPI and it never stops. ARM64 doesn't have a true NMI that
>> can override the looping with interrupts disabled, so there's no way to
>> stop that CPU.
>>
>> 6. The failure to stop the "pr/ttyAMA0" CPU then causes downstream
>> problems, such as when loading and running a kdump kernel.
[...]
> After reproducing the issue,
> I plan to try a workaround that forcibly terminates the nbcon_reacquire_nobuf
> loop in pl011_console_write_thread if other_cpu_in_panic is true.
> Please comment if you have any other ideas.
For panic, if it is OK to leave uap->clk enabled and not restore REG_CR,
then it should be fine to just return. But only for panic.
So something like:
while (!nbcon_enter_unsafe(wctxt)) {
if (other_cpu_in_panic())
return;
nbcon_reacquire_nobuf(wctxt);
}
(And other_cpu_in_panic() will need to be made generally available.)
John Ogness
Powered by blists - more mailing lists