[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGVn/SnOvwWewkOW@gmail.com>
Date: Wed, 2 Jul 2025 10:10:21 -0700
From: Breno Leitao <leitao@...ian.org>
To: cov@...eaurora.org, rmk+kernel@...linux.org.uk, mark.rutland@....com,
catalin.marinas@....com, linux-serial@...r.kernel.org
Cc: rmikey@...a.com, linux-arm-kernel@...ts.infradead.org,
usamaarif642@...il.com, leo.yan@....com,
linux-kernel@...r.kernel.org, paulmck@...nel.org
Subject: arm64: csdlock at early boot due to slow serial (?)
Hello,
I'm observing two unusual behaviors during the boot process on my SBSA
ARM machine, with upstream kernel (6.16-rc4):
1) A 9-second pause during early boot:
[ 0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200
[ 0.420120] Serial: AMBA PL011 UART driver
[ 0.875263] printk: console [ttyAMA0] enabled
[ 9.848263] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
2) Occasional CSD lock during early boot:
Intermittently, I encounter a CSD lock. Diagnosing this was challenging, but
after enabling PSEUDO NMI, I was able to capture the stack trace:
printk: console [ttyAMA0] enabled
smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#02 do_nothing (kernel/smp.c:1058)
smp: csd: CSD lock (#1) unresponsive.
Sending NMI from CPU 0 to CPUs 2:
....
pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
nbcon_emit_next_record (kernel/printk/nbcon.c:1030)
__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
vprintk_emit (kernel/printk/printk.c:2429)
On reviewing the amba-pl011.c code, I noticed that each message being flushed
causes the following loop to iterate roughly 20,000 times:
while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
cpu_relax();
Tracing this, I found that flushing early boot messages is taking a significant
amount of time. For example, trace_printk() output from that function shows:
swapper/0-1 [000] dN... 9.695941: pl011_console_write_atomic: "[ 0.928995] printk: console [ttyAMA0] enabled"
|
-> This is trace_printk of wctxt->outbuf
At timestamp 9.69 seconds, the serial console is still flushing messages from
0.92 seconds, indicating that the initial 9-second gap is spent looping in
cpu_relax()—about 20,000 times per message, which is clearly suboptimal.
Further debugging revealed the following sequence with the pl011 registers:
1) uart_console_write()
2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
Michael has reported a hardware issue where the BUSY bit could get
stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
If I am having the same hardware issue, I suppose I need to change that logic
to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
instead of BUSY.
Anyway, any one familar with this weird behaviour?
Thanks
--breno
Powered by blists - more mailing lists