linux-kernel - Problem with nbcon console and amba-pl011 serial port

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157A4C5E8CB219A75263A17D46DA@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Tue, 3 Jun 2025 03:18:23 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Toshiyuki Sato <fj6611ie@...jp.fujitsu.com>, Russell King
	<linux@...linux.org.uk>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-serial@...r.kernel.org" <linux-serial@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "fj6611ie@...jp.fujitsu.com"
	<fj6611ie@...jp.fujitsu.com>
Subject: Problem with nbcon console and amba-pl011 serial port

I'm seeing a problem in the panic path of an ARM64 VM on Hyper-V
in the Azure public cloud.  Here's the output from the VM's serial port:

# taskset -c 4 /bin/echo c >/proc/sysrq-trigger

[
** replaying previous printk message **
[   51.616656] sysrq: Trigger a crash
[   51.616689] Kernel panic - not syncing: sysrq triggered crash
[   51.624212] CPU: 4 UID: 0 PID: 2278 Comm: echo Tainted: G            E       6.15.0-rc7-next-20250521 #1 VOLUNTARY 
[   51.630165] Tainted: [E]=UNSIGNED_MODULE
[   51.632331] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
[   51.638771] Call trace:
[   51.640179]  show_stack+0x20/0x38 (C)
[   51.642488]  dump_stack_lvl+0xc8/0xf8
[   51.644638]  dump_stack+0x18/0x28
[   51.646654]  panic+0x384/0x478
[   51.648371]  sysrq_handle_crash+0x20/0x28
[   51.650815]  __handle_sysrq+0xdc/0x2b8
[   51.653080]  write_sysrq_trigger+0x124/0x240
[   51.655508]  proc_reg_write+0xa4/0x100
[   51.657917]  vfs_write+0xd8/0x3e0
[   51.659836]  ksys_write+0x74/0x110
[   51.661735]  __arm64_sys_write+0x24/0x38
[   51.663967]  invoke_syscall+0x6c/0xf8
[   51.666025]  el0_svc_common.constprop.0+0xc8/0xf0
[   51.668771]  do_el0_svc+0x24/0x38
[   51.670713]  el0_svc+0x40/0x198
[   51.672509]  el0t_64_sync_handler+0xc8/0xd0
[   51.675170]  el0t_64_sync+0x1b0/0x1b8
[   51.677351] SMP: stopping secondary CPUs
[   52.728175] SMP: failed to stop secondary CPUs 2
[   52.731229] Kernel Offset: 0x5706ebce0000 from 0xffff800080000000
[   52.734528] PHYS_OFFSET: 0x0
[   52.736115] CPU features: 0x2000,400007c0,02110ca1,5401faab
[   52.739275] Memory Limit: none
[   52.803615] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---

The problem is the failure to stop secondary CPU 2.  (The CPU # that fails
to stop varies from run-to-run.) It is mostly reproducible, but not always. I
bisected to commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon
console") in the 6.15 kernel. Further custom logging shows the details of
why the problem can happen. Here are the steps:

1. The "echo" command ends up in __handle_sysrq(), which outputs the
"sysrq: Trigger a crash" message using pr_info().  I always ran the "echo"
command on CPU 4 just for consistency in my testing/debugging -- there's
nothing special about CPU 4.

2. The "pr/ttyAMA0" kernel thread handles the outputting of the message.
nbcon_kthread_func() calls nbcon_emit_one() with the "use_atomic" parameter
set to false. nbcon_emit_one() in turn calls nbcon_emit_next_record() with
the console spin lock held and interrupts disabled. nbcon_emit_next_record()
then calls pl011_console_write_thread(). The latter has a "for" loop to output
each character of the message, and my custom logging shows that it is indeed
outputting the string "[   51.616656] sysrq: Trigger a crash".

3. While "pr/ttyAMA0" is doing its thing, __handle_sysrq() calls
sysrq_handle_crash(), which calls panic(). After some preliminaries, panic()
outputs the message "Kernel panic - not syncing: sysrq triggered crash"
using pr_emerg().

4. pr_emerg() has a high logging level, and it effectively steals the console
from the "pr/ttyAMA0" task, which I believe is intentional in the nbcon design.
Down in pl011_console_write_thread(), the "pr/ttyAMA0" task is doing
nbcon_enter_unsafe() and nbcon_exit_unsafe() around each character
that it outputs.  When pr_emerg() steals the console, nbcon_exit_unsafe()
returns 0, so the "for" loop exits. pl011_console_write_thread() then
enters a busy "while" loop waiting to reclaim the console. It's doing this
busy "while" loop with interrupts disabled, and because of the panic,
it never succeeds. Whatever CPU is running "pr/ttyAMA0" is effectively
stuck at this point.

5. Meanwhile panic() continues, calling panic_other_cpus_shutdown(). On
ARM64, other CPUs are stopped by sending them an IPI. Each CPU receives
the IPI and calls the PSCI function to stop itself. But the CPU running
"pr/ttyAMA0" is looping forever with interrupts disabled, so it never
processes the IPI and it never stops. ARM64 doesn't have a true NMI that
can override the looping with interrupts disabled, so there's no way to
stop that CPU.

6. The failure to stop the "pr/ttyAMA0" CPU then causes downstream
problems, such as when loading and running a kdump kernel.

The problem is timing dependent.  In some cases, the "pr/ttyAMA0"
thread is evidently able to get the message out before panic() calls
pr_emerg(). In my case running as a guest on Hyper-V, the pl011 device
in the guest VM is emulated by Hyper-V. Each pl011 register access
traps to the hypervisor, which slows things down and probably makes
the problem more likely. But from what I can see, the underlying
race condition exists regardless.

At this point, I just wanted to surface the problem. I don't have any
specific ideas on how to fix it, as my knowledge of nbcon is limited
to what I've learned in figuring out the failure path.

Toshiyuki -- what are your thoughts how to fix this?

Michael Kelley