[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20220119013140.383261-1-stephen.s.brennan@oracle.com>
Date: Tue, 18 Jan 2022 17:31:40 -0800
From: Stephen Brennan <stephen.s.brennan@...cle.com>
To: Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
John Ogness <john.ogness@...utronix.de>
Cc: Stephen Brennan <stephen.s.brennan@...cle.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH v2] printk: disable optimistic spin during panic
A CPU executing with console lock spinning enabled might be halted
during a panic. Before the panicking CPU calls console_flush_on_panic(),
it may call console_trylock(), which attempts to optimistically spin,
deadlocking the panic CPU:
CPU 0 (panic CPU) CPU 1
----------------- ------
printk() {
vprintk_func() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_lock_spinning_enable();
... printing to console ...
panic() {
crash_smp_send_stop() {
NMI -------------------> HALT
}
atomic_notifier_call_chain() {
printk() {
...
console_trylock_spinnning() {
// optimistic spin infinitely
This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:
#!/bin/bash
date
# 991 chars (based on log buffer size):
chars="$(printf 'a%.0s' {1..991})"
while :; do
echo $chars > /dev/kmsg
done &
echo c > /proc/sysrq-trigger &
date
exit
To avoid this deadlock, ensure that console_trylock_spinning() does not
allow spinning once a panic has begun. Below are the hang rates for the
above test case, based on v5.16-rc8 before and after this patch:
before: 776 hangs / 1484 trials - 52.3%
after : 0 hangs / 102 trials - 0.0%
Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
Signed-off-by: Stephen Brennan <stephen.s.brennan@...cle.com>
---
Note: I continue to run my tests past 100 trials, but at this point I'm
confident enough in the fix to send the patch.
As Petr suggested I will send another patch making the non-panic CPUs
bail out of console_unlock(), but I wanted to get this one first, since
it is implemented and ready now.
kernel/printk/printk.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 57b132b658e1..612ed895b967 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1843,6 +1843,16 @@ static int console_trylock_spinning(void)
if (console_trylock())
return 1;
+ /*
+ * It's unsafe for the panic CPU to spin, since the CPU holding the
+ * console_sem may have already been halted. However, if a panic has
+ * started, but we're not the panic CPU, we still spin. This protects
+ * the panic CPU from needing to write out messages buffered by other
+ * CPUs in console_unlock().
+ */
+ if (atomic_read(&panic_cpu) == raw_smp_processor_id())
+ return 0;
+
printk_safe_enter_irqsave(flags);
raw_spin_lock(&console_owner_lock);
--
2.30.2
Powered by blists - more mailing lists