[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0577123-59c8-438e-b646-27e70795c17d@linux.dev>
Date: Sun, 16 Nov 2025 15:58:32 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Feng Tang <feng.tang@...ux.alibaba.com>
Cc: Petr Mladek <pmladek@...e.com>, Andrew Morton
<akpm@...ux-foundation.org>, Steven Rostedt <rostedt@...dmis.org>,
Lance Yang <ioworker0@...il.com>, linux-kernel@...r.kernel.org,
Jonathan Corbet <corbet@....net>, paulmck@...nel.org, lirongqing@...du.com,
leonylgao@...cent.com
Subject: Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump
sys info on task-hung
On 2025/11/13 19:10, Feng Tang wrote:
> When task-hung happens, developers may need different kinds of system
> information (call-stacks, memory info, locks, etc.) to help debugging.
>
> Add 'hung_task_sys_info' sysctl knob to take human readable string like
> "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> requested information will be dumped. (refer kernel/sys_info.c for more
> details).
>
> Meanwhile, the newly introduced sys_info() call is used to unify some
> existing info-dumping knobs.
>
> Suggested-by: Petr Mladek <pmladek@...e.com>
> Signed-off-by: Feng Tang <feng.tang@...ux.alibaba.com>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 5 ++
> kernel/hung_task.c | 62 +++++++++++++--------
> 2 files changed, 43 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index a397eeccaea7..45b4408dad31 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
[...]
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 5ac0e66a1361..5b3a7785d3a2 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -24,6 +24,7 @@
> #include <linux/sched/sysctl.h>
> #include <linux/hung_task.h>
> #include <linux/rwsem.h>
> +#include <linux/sys_info.h>
>
> #include <trace/events/sched.h>
>
> @@ -59,12 +60,17 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
> static int __read_mostly sysctl_hung_task_warnings = 10;
>
> static int __read_mostly did_panic;
> -static bool hung_task_show_lock;
> static bool hung_task_call_panic;
> -static bool hung_task_show_all_bt;
>
> static struct task_struct *watchdog_task;
>
> +/*
> + * A bitmask to control what kinds of system info to be printed when
> + * a hung task is detected, it could be task, memory, lock etc. Refer
> + * include/linux/sys_info.h for detailed bit definition.
> + */
> +static unsigned long hung_task_si_mask;
> +
> #ifdef CONFIG_SMP
> /*
> * Should we dump all CPUs backtraces in a hung task event?
> @@ -217,11 +223,8 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
> }
> #endif
>
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> - unsigned long prev_detect_count)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout)
> {
> - unsigned long total_hung_task;
> -
> if (!task_is_hung(t, timeout))
> return;
>
> @@ -231,20 +234,13 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
> */
> sysctl_hung_task_detect_count++;
>
> - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> trace_sched_process_hang(t);
>
> - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> - console_verbose();
> - hung_task_show_lock = true;
> - hung_task_call_panic = true;
> - }
> -
> /*
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - if (sysctl_hung_task_warnings || hung_task_call_panic) {
> + if (sysctl_hung_task_warnings) {
It seems like the behavior changes when sysctl_hung_task_warnings is
0 but a panic is about to be triggered ...
Looking at the history:
1) Commit ("hung_task: ignore hung_task_warnings when hung_task_panic
is enabled")[1] ensured that hung task information is always dumped
when a panic is configured, even if the warning counter is exhausted.
2) Later, commit ("hung_task: panic when there are more than N hung
tasks at the same time")[2] refined the logic to trigger a panic based
on the number of hung tasks found in a single scan.
To stay consistent with the established behavior, I think we should
continue to dump the information for hung tasks as long as
sysctl_hung_task_panic is enabled :)
[1] https://lore.kernel.org/all/20240613033159.3446265-1-leonylgao@gmail.com
[2] https://lore.kernel.org/all/20251015063615.2632-1-lirongqing@baidu.com
[...]
Cheers,
Lance
Powered by blists - more mailing lists