linux-kernel - Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aR24iloIoSjb6X1t@pathway.suse.cz>
Date: Wed, 19 Nov 2025 13:31:06 +0100
From: Petr Mladek <pmladek@...e.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Feng Tang <feng.tang@...ux.alibaba.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Lance Yang <ioworker0@...il.com>, linux-kernel@...r.kernel.org,
	Jonathan Corbet <corbet@....net>, paulmck@...nel.org,
	lirongqing@...du.com, leonylgao@...cent.com
Subject: Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump
 sys info on task-hung

On Wed 2025-11-19 01:57:36, Lance Yang wrote:
> On 2025/11/18 23:20, Petr Mladek wrote:
> > Well, the behavior is still not ideal. It would be better when
> > we printed backtraces from _all_ "hung" tasks before panicking.
> > But it prints the backtraces only when sysctl_hung_task_panic
> > limit is reached.
> > 
> > I mean, for example, let's have:
> > 
> >    + sysctl_hung_task_warnings = 2;
> >    + sysctl_hung_task_panic = 5;
> >    + and detect 6 hung tasks.
> > 
> > The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
> > because sysctl_hung_task_warnings reached 0. It will report 5th and
> > 6th tasks because (total_hung_task >= 5).
> > 
> > It is better than nothing. But it might be confusing.
> 
> Right, I can see how it might be confusing.
> 
> IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
> It makes sense that reports are suppressed after the limit is exhausted,
> except when the sysctl_hung_task_panic threshold is reached ;)
> 
> > I am not sure how to fix it. A minimalist solution would be to print
> > a warning. Something like:
> > 
> > 	if (sysctl_hung_task_panic > 1 &&
> > 	    (total_hung_task == sysctl_hung_task_panic) &&
> > 	    !sysctl_hung_task_warnings) {
> > 		pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
> > 			sysctl_hung_task_panic - 1);
> > 
> > Or we could print the "total_hung_task" counter somewhere, for
> > example,
> > 
> > 		pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
> > 			total_hung_task, ...
> > 
> > Or we could restart the for_each_process_thread() cycle and make sure
> > that all hung tasks will get reported.
> > 
> > Or we could ignore it until anyone complains.
> 
> It looks like we already inform the user when that happens. When
> sysctl_hung_task_warnings is finally decremented to zero, the code prints:
> 
> ```
> if (!sysctl_hung_task_warnings)
> 	pr_info("Future hung task reports are suppressed, see sysctl
> kernel.hung_task_warnings\n");
> ```
> 
> Given that this explicit warning is already in place, perhaps the current
> behavior is sufficient and clear enough?

The warning might get lost or it might happen long time before
critical stall so people might miss it.

But you are right. There is a warning. And my worries are rather
theoretical. Let's keep the code simple until anyone complains.

Best Regards,
Petr