linux-kernel - RE: Problem with nbcon console and amba-pl011 serial port

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <TY4PR01MB13777CC92C858572B9C19394FD76FA@TY4PR01MB13777.jpnprd01.prod.outlook.com>
Date: Thu, 5 Jun 2025 05:27:56 +0000
From: "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com>
To: 'Petr Mladek' <pmladek@...e.com>, John Ogness <john.ogness@...utronix.de>
CC: 'Michael Kelley' <mhklinux@...look.com>, 'Ryo Takakura'
	<ryotkkr98@...il.com>, Russell King <linux@...linux.org.uk>, Greg
 Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby
	<jirislaby@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-serial@...r.kernel.org"
	<linux-serial@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "Toshiyuki Sato (Fujitsu)"
	<fj6611ie@...itsu.com>
Subject: RE: Problem with nbcon console and amba-pl011 serial port

Hi John and Petr,

> On Wed 2025-06-04 13:56:34, John Ogness wrote:
> > On 2025-06-04, Petr Mladek <pmladek@...e.com> wrote:
> > > On Wed 2025-06-04 04:11:10, Toshiyuki Sato (Fujitsu) wrote:
> > >> > On 2025-06-03, John Ogness <john.ogness@...utronix.de> wrote:
> > >> > > On 2025-06-03, "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com> wrote:
> > >> > >>> 4. pr_emerg() has a high logging level, and it effectively steals the console
> > >> > >>> from the "pr/ttyAMA0" task, which I believe is intentional in the nbcon
> > >> > design.
> > >> > >>> Down in pl011_console_write_thread(), the "pr/ttyAMA0" task is doing
> > >> > >>> nbcon_enter_unsafe() and nbcon_exit_unsafe() around each character
> > >> > >>> that it outputs.  When pr_emerg() steals the console, nbcon_exit_unsafe()
> > >> > >>> returns 0, so the "for" loop exits. pl011_console_write_thread() then
> > >> > >>> enters a busy "while" loop waiting to reclaim the console. It's doing this
> > >> > >>> busy "while" loop with interrupts disabled, and because of the panic,
> > >> > >>> it never succeeds.
> > >
> > > I am a bit surprised that it never succeeds. The panic CPU takes over
> > > the ownership but it releases it when the messages are flushed. And
> > > the original owner should be able to reacquire it in this case.
> >
> > The problem is that other_cpu_in_panic() will return true forever, which
> > will cause _all_ acquires to fail forever. Originally we did allow
> > non-panic to take over again after panic releases ownership. But IIRC we
> > removed that capability because it allowed us to reduce a lot of
> > complexity. And now nbcon_waiter_matches() relies on "Lower priorities
> > are ignored during panic() until reboot."
> 
> Great catch! I forgot it. And it explains everything.
> 
> It would be nice to mention this in the commit message or
> in the comment above nbcon_reacquire_nobuf().
> 
> My updated prosal of the comment is:
> 
>  * Return:	True when the context reacquired the owner ship. The caller
>  *		might try entering the unsafe state and restore the original
>  *		console device setting. It must not access the output buffer
>  *		anymore.
>  *
>  *		False when another CPU is in panic(). nbcon_try_acquire()
>  *		would never succeed and the infinite loop would	prevent
>  *		stopping this CPU on architectures without proper NMI.
>  *		The caller should bail out immediately without
>  *		touching the console device or the output buffer.
> 
> Best Regards,
> Petr

Thank you for your comments and suggestions.

After consideration, 
I believe that there is no problem with forcibly terminating the process when 
nbcon_reacquire_nobuf returns false at the pl011 driver level, 
as in the test patch.

It feels a bit harsh that a thread which started processing before the panic
and then transferred ownership to an atomic operation isn't allowed to perform
cleanup during panic handling or the grace period before the CPU halts.

I would like to hear your opinion on this.
If nbcon_reacquire_nobuf() could acquire ownership even after the panic, 
then driver-side modifications might not be necessary. 
(The responsibility to complete the process without hindering the panic process
would still remain.)

Would it be difficult to make an exception to the rule,
  "Lower priorities are ignored during panic() until reboot,"
depending on the situation?

This is just my personal opinion.
I think it would be good to get feedback from other UART driver developers as well.

Regards,
Toshiyuki Sato