[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250825022947.1596226-10-wangjinchao600@gmail.com>
Date: Mon, 25 Aug 2025 10:29:37 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Baoquan He <bhe@...hat.com>,
Yury Norov <yury.norov@...il.com>,
Qianqiang Liu <qianqiang.liu@....com>,
Simona Vetter <simona@...ll.ch>,
Helge Deller <deller@....de>,
Petr Mladek <pmladek@...e.com>,
Steven Rostedt <rostedt@...dmis.org>,
John Ogness <john.ogness@...utronix.de>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Vivek Goyal <vgoyal@...hat.com>,
Dave Young <dyoung@...hat.com>,
Kees Cook <kees@...nel.org>,
Tony Luck <tony.luck@...el.com>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
Thomas Zimmermann <tzimmermann@...e.de>,
Ville Syrjälä <ville.syrjala@...ux.intel.com>,
Shixiong Ou <oushixiong@...inos.cn>,
Jinchao Wang <wangjinchao600@...il.com>,
Zsolt Kajtar <soci@....rulez.org>,
Ingo Molnar <mingo@...nel.org>,
Nam Cao <namcao@...utronix.de>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Joel Fernandes <joelagnelf@...dia.com>,
Joel Granados <joel.granados@...nel.org>,
Jason Gunthorpe <jgg@...pe.ca>,
Sohil Mehta <sohil.mehta@...el.com>,
Feng Tang <feng.tang@...ux.alibaba.com>,
Sravan Kumar Gundu <sravankumarlpu@...il.com>,
Douglas Anderson <dianders@...omium.org>,
Thomas Gleixner <tglx@...utronix.de>,
Anna Schumaker <anna.schumaker@...cle.com>,
"Darrick J. Wong" <djwong@...nel.org>,
Max Kellermann <max.kellermann@...os.com>,
Yunhui Cui <cuiyunhui@...edance.com>,
Tejun Heo <tj@...nel.org>,
Luo Gengkun <luogengkun@...weicloud.com>,
Li Huafei <lihuafei1@...wei.com>,
Thorsten Blum <thorsten.blum@...ux.dev>,
Yicong Yang <yangyicong@...ilicon.com>,
linux-fbdev@...r.kernel.org,
dri-devel@...ts.freedesktop.org,
kexec@...ts.infradead.org,
linux-hardening@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH v2 9/9] watchdog: skip checks when panic is in progress
This issue was found when an EFI pstore was configured for kdump
logging with the NMI hard lockup detector enabled. The efi-pstore
write operation was slow, and with a large number of logs, the
pstore dump callback within kmsg_dump() took a long time.
This delay triggered the NMI watchdog, leading to a nested panic.
The call flow demonstrates how the secondary panic caused an
emergency_restart() to be triggered before the initial pstore
operation could finish, leading to a failure to dump the logs:
real panic() {
kmsg_dump() {
...
pstore_dump() {
start_dump();
... // long time operation triggers NMI watchdog
nmi panic() {
...
emergency_restart(); // pstore unfinished
}
...
finish_dump(); // never reached
}
}
}
Both watchdog_buddy_check_hardlockup() and watchdog_overflow_callback() may
trigger during a panic. This can lead to recursive panic handling.
Add panic_in_progress() checks so watchdog activity is skipped once a panic
has begun.
This prevents recursive panic and keeps the panic path more reliable.
Signed-off-by: Jinchao Wang <wangjinchao600@...il.com>
Reviewed-by: Yury Norov (NVIDIA) <yury.norov@...il.com>
---
kernel/watchdog.c | 6 ++++++
kernel/watchdog_perf.c | 4 ++++
2 files changed, 10 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 80b56c002c7f..597c0d947c93 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -740,6 +740,12 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
if (!watchdog_enabled)
return HRTIMER_NORESTART;
+ /*
+ * pass the buddy check if a panic is in process
+ */
+ if (panic_in_progress())
+ return HRTIMER_NORESTART;
+
watchdog_hardlockup_kick();
/* kick the softlockup detector */
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 9c58f5b4381d..d3ca70e3c256 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -12,6 +12,7 @@
#define pr_fmt(fmt) "NMI watchdog: " fmt
+#include <linux/panic.h>
#include <linux/nmi.h>
#include <linux/atomic.h>
#include <linux/module.h>
@@ -108,6 +109,9 @@ static void watchdog_overflow_callback(struct perf_event *event,
/* Ensure the watchdog never gets throttled */
event->hw.interrupts = 0;
+ if (panic_in_progress())
+ return;
+
if (!watchdog_check_timestamp())
return;
--
2.43.0
Powered by blists - more mailing lists