[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRyOxD5kytnneUQY@pathway.suse.cz>
Date: Tue, 18 Nov 2025 16:20:36 +0100
From: Petr Mladek <pmladek@...e.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Feng Tang <feng.tang@...ux.alibaba.com>,
Lance Yang <lance.yang@...ux.dev>,
Steven Rostedt <rostedt@...dmis.org>,
Lance Yang <ioworker0@...il.com>, linux-kernel@...r.kernel.org,
Jonathan Corbet <corbet@....net>, paulmck@...nel.org,
lirongqing@...du.com, leonylgao@...cent.com
Subject: Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump
sys info on task-hung
On Mon 2025-11-17 09:53:52, Andrew Morton wrote:
> On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@...ux.alibaba.com> wrote:
>
> > > > if (need_warning || hung_task_call_panic) {
> > > > si_mask |= SYS_INFO_LOCKS;
> > >
> > > Looks good to me now! I assume v3 would be expected, can you
> > > post a new version?
> >
> > Andrew has taken the patchset to -mm tree.
> >
> > Andrew, which way do you prefer? I send a v3 patch for hung-task or you
> > pickup the fixup patch and squash it into the orginal 0002 patch?
> >
> > Anyway, I make a squshed version v3 patch below.
>
> I prefer little fixup patches, generally. So people can see what
> changed and don't feel they should re-review everything.
>
> I queued the below, thanks.
>
> From: Feng Tang <feng.tang@...ux.alibaba.com>
> Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
> Date: Wed, 5 Nov 2025 19:30:36 +0800
>
> maintain consistecy established behavior, per Lance and Petr
>
> Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
> Suggested-by: Petr Mladek <pmladek@...e.com>
> Signed-off-by: Feng Tang <feng.tang@...ux.alibaba.com>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Lance Yang <ioworker0@...il.com>
> Cc: "Paul E . McKenney" <paulmck@...nel.org>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
Thanks a lot for catching and fixing the regression caused
by this patchset. The patch looks good.
See a comment below.
> --- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
> +++ a/kernel/hung_task.c
> @@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
> }
> #endif
>
> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
> + unsigned long prev_detect_count)
> {
> + unsigned long total_hung_task;
> +
> if (!task_is_hung(t, timeout))
> return;
>
> @@ -234,13 +237,19 @@ static void check_hung_task(struct task_
> */
> sysctl_hung_task_detect_count++;
>
> + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> trace_sched_process_hang(t);
>
> + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> + console_verbose();
> + hung_task_call_panic = true;
> + }
> +
> /*
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - if (sysctl_hung_task_warnings) {
> + if (sysctl_hung_task_warnings || hung_task_call_panic) {
> if (sysctl_hung_task_warnings > 0)
> sysctl_hung_task_warnings--;
> pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
This restores the behavior after the commit 9544f9e6947f6508
("hung_task: panic when there are more than N hung tasks at
the same time"). It is better than nothing.
Well, the behavior is still not ideal. It would be better when
we printed backtraces from _all_ "hung" tasks before panicking.
But it prints the backtraces only when sysctl_hung_task_panic
limit is reached.
I mean, for example, let's have:
+ sysctl_hung_task_warnings = 2;
+ sysctl_hung_task_panic = 5;
+ and detect 6 hung tasks.
The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
because sysctl_hung_task_warnings reached 0. It will report 5th and
6th tasks because (total_hung_task >= 5).
It is better than nothing. But it might be confusing.
I am not sure how to fix it. A minimalist solution would be to print
a warning. Something like:
if (sysctl_hung_task_panic > 1 &&
(total_hung_task == sysctl_hung_task_panic) &&
!sysctl_hung_task_warnings) {
pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
sysctl_hung_task_panic - 1);
Or we could print the "total_hung_task" counter somewhere, for
example,
pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
total_hung_task, ...
Or we could restart the for_each_process_thread() cycle and make sure
that all hung tasks will get reported.
Or we could ignore it until anyone complains.
Best Regards,
Petr
Powered by blists - more mailing lists