[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEBNLMYVUOGzusuR@pathway.suse.cz>
Date: Wed, 4 Jun 2025 15:42:04 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com>,
'Michael Kelley' <mhklinux@...look.com>,
'Ryo Takakura' <ryotkkr98@...il.com>,
Russell King <linux@...linux.org.uk>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-serial@...r.kernel.org" <linux-serial@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: Problem with nbcon console and amba-pl011 serial port
On Wed 2025-06-04 13:56:34, John Ogness wrote:
> On 2025-06-04, Petr Mladek <pmladek@...e.com> wrote:
> > On Wed 2025-06-04 04:11:10, Toshiyuki Sato (Fujitsu) wrote:
> >> > On 2025-06-03, John Ogness <john.ogness@...utronix.de> wrote:
> >> > > On 2025-06-03, "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com> wrote:
> >> > >>> 4. pr_emerg() has a high logging level, and it effectively steals the console
> >> > >>> from the "pr/ttyAMA0" task, which I believe is intentional in the nbcon
> >> > design.
> >> > >>> Down in pl011_console_write_thread(), the "pr/ttyAMA0" task is doing
> >> > >>> nbcon_enter_unsafe() and nbcon_exit_unsafe() around each character
> >> > >>> that it outputs. When pr_emerg() steals the console, nbcon_exit_unsafe()
> >> > >>> returns 0, so the "for" loop exits. pl011_console_write_thread() then
> >> > >>> enters a busy "while" loop waiting to reclaim the console. It's doing this
> >> > >>> busy "while" loop with interrupts disabled, and because of the panic,
> >> > >>> it never succeeds.
> >
> > I am a bit surprised that it never succeeds. The panic CPU takes over
> > the ownership but it releases it when the messages are flushed. And
> > the original owner should be able to reacquire it in this case.
>
> The problem is that other_cpu_in_panic() will return true forever, which
> will cause _all_ acquires to fail forever. Originally we did allow
> non-panic to take over again after panic releases ownership. But IIRC we
> removed that capability because it allowed us to reduce a lot of
> complexity. And now nbcon_waiter_matches() relies on "Lower priorities
> are ignored during panic() until reboot."
Great catch! I forgot it. And it explains everything.
It would be nice to mention this in the commit message or
in the comment above nbcon_reacquire_nobuf().
My updated prosal of the comment is:
* Return: True when the context reacquired the owner ship. The caller
* might try entering the unsafe state and restore the original
* console device setting. It must not access the output buffer
* anymore.
*
* False when another CPU is in panic(). nbcon_try_acquire()
* would never succeed and the infinite loop would prevent
* stopping this CPU on architectures without proper NMI.
* The caller should bail out immediately without
* touching the console device or the output buffer.
Best Regards,
Petr
Powered by blists - more mailing lists