[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923033740.2696-1-lirongqing@baidu.com>
Date: Tue, 23 Sep 2025 11:37:40 +0800
From: lirongqing <lirongqing@...du.com>
To: <corbet@....net>, <akpm@...ux-foundation.org>, <lance.yang@...ux.dev>,
<mhiramat@...nel.org>, <paulmck@...nel.org>,
<pawan.kumar.gupta@...ux.intel.com>, <mingo@...nel.org>,
<dave.hansen@...ux.intel.com>, <rostedt@...dmis.org>, <kees@...nel.org>,
<arnd@...db.de>, <lirongqing@...du.com>, <feng.tang@...ux.alibaba.com>,
<pauld@...hat.com>, <joel.granados@...nel.org>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: [PATCH][RFC] hung_task: Support to panic when the maximum number of hung task warnings is reached
From: Li RongQing <lirongqing@...du.com>
Currently the hung task detector can either panic immediately or continue
operation when hung tasks are detected. However, there are scenarios
where we want a more balanced approach:
- We don't want the system to panic immediately when a few hung tasks
are detected, as the system may be able to recover
- And we also don't want the system to stall indefinitely with multiple
hung tasks
This commit introduces a new mode (value 2) for the hung task panic behavior.
When set to 2, the system will panic only after the maximum number of hung
task warnings (hung_task_warnings) has been reached.
This provides a middle ground between immediate panic and potentially
infinite stall, allowing for automated vmcore generation after a reasonable
number of hung task incidents.
Signed-off-by: Li RongQing <lirongqing@...du.com>
---
Documentation/admin-guide/kernel-parameters.txt | 15 ++++++++-------
Documentation/admin-guide/sysctl/kernel.rst | 1 +
kernel/hung_task.c | 5 +++--
lib/Kconfig.debug | 4 ++--
4 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5a7a83c..f2a9876 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1993,13 +1993,14 @@
hung_task_panic=
[KNL] Should the hung task detector generate panics.
- Format: 0 | 1
-
- A value of 1 instructs the kernel to panic when a
- hung task is detected. The default value is controlled
- by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
- option. The value selected by this boot parameter can
- be changed later by the kernel.hung_task_panic sysctl.
+ Format: 0 | 1 | 2
+
+ A value of 1 instructs the kernel to panic when a hung task is detected.
+ A value of 2 instructs the kernel to panic when hung_task_warnings is
+ decreased to 0. The default value is controlled by the
+ CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. The value selected
+ by this boot parameter can be changed later by the kernel.hung_task_panic
+ sysctl.
hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC)
terminal devices. Valid values: 0..8
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 8b49eab..6f77241 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -403,6 +403,7 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
= =================================================
0 Continue operation. This is the default behavior.
1 Panic immediately.
+2 Panic when hung_task_warnings is decreased to 0.
= =================================================
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8708a12..b052ec7 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -219,7 +219,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
trace_sched_process_hang(t);
- if (sysctl_hung_task_panic) {
+ if ((sysctl_hung_task_panic == 1) ||
+ (!sysctl_hung_task_warnings && sysctl_hung_task_panic == 2)) {
console_verbose();
hung_task_show_lock = true;
hung_task_call_panic = true;
@@ -385,7 +386,7 @@ static const struct ctl_table hung_task_sysctls[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_TWO,
},
{
.procname = "hung_task_check_count",
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dc0e0c6..e7cf166 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1264,10 +1264,10 @@ config DEFAULT_HUNG_TASK_TIMEOUT
Keeping the default should be fine in most cases.
config BOOTPARAM_HUNG_TASK_PANIC
- bool "Panic (Reboot) On Hung Tasks"
+ int "Panic (Reboot) On Hung Tasks"
depends on DETECT_HUNG_TASK
help
- Say Y here to enable the kernel to panic on "hung tasks",
+ Say 1|2 here to enable the kernel to panic on "hung tasks",
which are bugs that cause the kernel to leave a task stuck
in uninterruptible "D" state.
--
2.9.4
Powered by blists - more mailing lists