[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aR24iloIoSjb6X1t@pathway.suse.cz>
Date: Wed, 19 Nov 2025 13:31:06 +0100
From: Petr Mladek <pmladek@...e.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Feng Tang <feng.tang@...ux.alibaba.com>,
Steven Rostedt <rostedt@...dmis.org>,
Lance Yang <ioworker0@...il.com>, linux-kernel@...r.kernel.org,
Jonathan Corbet <corbet@....net>, paulmck@...nel.org,
lirongqing@...du.com, leonylgao@...cent.com
Subject: Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump
sys info on task-hung
On Wed 2025-11-19 01:57:36, Lance Yang wrote:
> On 2025/11/18 23:20, Petr Mladek wrote:
> > Well, the behavior is still not ideal. It would be better when
> > we printed backtraces from _all_ "hung" tasks before panicking.
> > But it prints the backtraces only when sysctl_hung_task_panic
> > limit is reached.
> >
> > I mean, for example, let's have:
> >
> > + sysctl_hung_task_warnings = 2;
> > + sysctl_hung_task_panic = 5;
> > + and detect 6 hung tasks.
> >
> > The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
> > because sysctl_hung_task_warnings reached 0. It will report 5th and
> > 6th tasks because (total_hung_task >= 5).
> >
> > It is better than nothing. But it might be confusing.
>
> Right, I can see how it might be confusing.
>
> IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
> It makes sense that reports are suppressed after the limit is exhausted,
> except when the sysctl_hung_task_panic threshold is reached ;)
>
> > I am not sure how to fix it. A minimalist solution would be to print
> > a warning. Something like:
> >
> > if (sysctl_hung_task_panic > 1 &&
> > (total_hung_task == sysctl_hung_task_panic) &&
> > !sysctl_hung_task_warnings) {
> > pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
> > sysctl_hung_task_panic - 1);
> >
> > Or we could print the "total_hung_task" counter somewhere, for
> > example,
> >
> > pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
> > total_hung_task, ...
> >
> > Or we could restart the for_each_process_thread() cycle and make sure
> > that all hung tasks will get reported.
> >
> > Or we could ignore it until anyone complains.
>
> It looks like we already inform the user when that happens. When
> sysctl_hung_task_warnings is finally decremented to zero, the code prints:
>
> ```
> if (!sysctl_hung_task_warnings)
> pr_info("Future hung task reports are suppressed, see sysctl
> kernel.hung_task_warnings\n");
> ```
>
> Given that this explicit warning is already in place, perhaps the current
> behavior is sufficient and clear enough?
The warning might get lost or it might happen long time before
critical stall so people might miss it.
But you are right. There is a warning. And my worries are rather
theoretical. Let's keep the code simple until anyone complains.
Best Regards,
Petr
Powered by blists - more mailing lists