[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200603141915.38739-1-cj.chengjian@huawei.com>
Date: Wed, 3 Jun 2020 14:19:15 +0000
From: Cheng Jian <cj.chengjian@...wei.com>
To: <linux-kernel@...r.kernel.org>
CC: <cj.chengjian@...wei.com>, <chenwandun@...wei.com>,
<xiexiuqi@...wei.com>, <bobo.shaobowang@...wei.com>,
<huawei.libin@...wei.com>, <pmladek@...e.com>,
<sergey.senozhatsky@...il.com>, <rostedt@...dmis.org>
Subject: [RFC PATCH] panic: fix deadlock in panic()
A deadlock caused by logbuf_lock occurs when panic:
a) Panic CPU is running in non-NMI context
b) Panic CPU sends out shutdown IPI via NMI vector
c) One of the CPUs that we bring down via NMI vector holded logbuf_lock
d) Panic CPU try to hold logbuf_lock, then deadlock occurs.
we try to re-init the logbuf_lock in printk_safe_flush_on_panic()
to avoid deadlock, but it does not work here, because :
Firstly, it is inappropriate to check num_online_cpus() here.
When the CPU bring down via NMI vector, the panic CPU willn't
wait too long for other cores to stop, so when this problem
occurs, num_online_cpus() may be greater than 1.
Secondly, printk_safe_flush_on_panic() is called after panic
notifier callback, so if printk() is called in panic notifier
callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops
is set, we print some debug information, it will try to hold the
logbuf_lock.
To avoid this deadlock, drop the num_online_cpus() check and call
the printk_safe_flush_on_panic() before panic_notifier_list callback,
attempt to re-init logbuf_lock from panic CPU.
Signed-off-by: Cheng Jian <cj.chengjian@...wei.com>
---
kernel/panic.c | 3 +++
kernel/printk/printk_safe.c | 3 ---
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/panic.c b/kernel/panic.c
index b69ee9e76cb2..8dbcb2227b60 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -255,6 +255,9 @@ void panic(const char *fmt, ...)
crash_smp_send_stop();
}
+ /* Call flush even twice. It tries harder with a single online CPU */
+ printk_safe_flush_on_panic();
+
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..9ebc1723e1a4 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -269,9 +269,6 @@ void printk_safe_flush_on_panic(void)
* Do not risk a double release when more CPUs are up.
*/
if (raw_spin_is_locked(&logbuf_lock)) {
- if (num_online_cpus() > 1)
- return;
-
debug_locks_off();
raw_spin_lock_init(&logbuf_lock);
}
--
2.17.1
Powered by blists - more mailing lists