[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<TY4PR01MB1377757DD5E7F27A41F0B4723D76FA@TY4PR01MB13777.jpnprd01.prod.outlook.com>
Date: Thu, 5 Jun 2025 06:22:32 +0000
From: "Toshiyuki Sato (Fujitsu)" <fj6611ie@...itsu.com>
To: 'Michael Kelley' <mhklinux@...look.com>
CC: 'John Ogness' <john.ogness@...utronix.de>, "pmladek@...e.com"
<pmladek@...e.com>, 'Ryo Takakura' <ryotkkr98@...il.com>, Russell King
<linux@...linux.org.uk>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-serial@...r.kernel.org"
<linux-serial@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "Toshiyuki Sato (Fujitsu)"
<fj6611ie@...itsu.com>
Subject: RE: Problem with nbcon console and amba-pl011 serial port
Hi Michael,
> From: Michael Kelley <mhklinux@...look.com>
> Sent: Thursday, June 5, 2025 11:49 AM
> > Hi Michael, John,
> >
>
> [snip]
>
> >
> > This is a proposed fix to force termination by returning false from
> > nbcon_reacquire_nobuf when a panic occurs within pl011_console_write_thread.
> > (I believe this is similar to what John suggested in his previous
> > reply.)
> >
> > While I couldn't reproduce the issue using sysrq-trigger in my
> > environment (It seemed that the panic was being executed before the
> > thread processing), I did observe nbcon_reacquire_nobuf failing to
> > complete when injecting an NMI (SError) during pl011_console_write_thread.
> > Applying this fix seems to have resolved the "SMP: failed to stop
> > secondary CPUs" issue.
> >
> > This patch is for test.
> > Modifications to imx and other drivers, as well as adding
> > __must_check, will likely be required.
> >
> > Michael, could you please test this fix in your environment?
>
> I've tested the fix in my primary environment (ARM64 VM in the Azure cloud), and I've seen no failures to stop a CPU. I kept my
> custom logging in place, so I could confirm that the problem path is still happening, and the fix recovers from the problem path.
> So the good results are not due to just a timing change. The "pr/ttyAMA0" task is still looping forever trying to get ownership
> of the console, but it is doing so at a higher level in nbcon_kthread_func() and in calling nbcon_emit_one(), and interrupts are
> enabled for part of the loop.
>
> Full disclosure: I have a secondary environment, also an ARM64 VM in the Azure cloud, but running on an older version of
> Hyper-V. In this environment I see the same custom logging results, and the "pr/ttyAMA0" task is indeed looping with
> interrupts enabled. But for some reason, the CPU doesn't stop in response to IPI_CPU_STOP. I don't see any evidence that this
> failure to stop is due to the Linux pl011 driver or nbcon. This older version of Hyper-V has a known problem in pl011 UART
> emulation, and I have a theory on how that problem may be causing the failure to stop. It will take me some time to investigate
> further, but based on what I know now, that investigation should not hold up this fix.
>
> Michael
Thank you for testing the patch.
I'm concerned about the thread looping...
Regards,
Toshiyuki Sato
Powered by blists - more mailing lists