lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aR24iloIoSjb6X1t@pathway.suse.cz>
Date: Wed, 19 Nov 2025 13:31:06 +0100
From: Petr Mladek <pmladek@...e.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Feng Tang <feng.tang@...ux.alibaba.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Lance Yang <ioworker0@...il.com>, linux-kernel@...r.kernel.org,
	Jonathan Corbet <corbet@....net>, paulmck@...nel.org,
	lirongqing@...du.com, leonylgao@...cent.com
Subject: Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump
 sys info on task-hung

On Wed 2025-11-19 01:57:36, Lance Yang wrote:
> On 2025/11/18 23:20, Petr Mladek wrote:
> > Well, the behavior is still not ideal. It would be better when
> > we printed backtraces from _all_ "hung" tasks before panicking.
> > But it prints the backtraces only when sysctl_hung_task_panic
> > limit is reached.
> > 
> > I mean, for example, let's have:
> > 
> >    + sysctl_hung_task_warnings = 2;
> >    + sysctl_hung_task_panic = 5;
> >    + and detect 6 hung tasks.
> > 
> > The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
> > because sysctl_hung_task_warnings reached 0. It will report 5th and
> > 6th tasks because (total_hung_task >= 5).
> > 
> > It is better than nothing. But it might be confusing.
> 
> Right, I can see how it might be confusing.
> 
> IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
> It makes sense that reports are suppressed after the limit is exhausted,
> except when the sysctl_hung_task_panic threshold is reached ;)
> 
> > I am not sure how to fix it. A minimalist solution would be to print
> > a warning. Something like:
> > 
> > 	if (sysctl_hung_task_panic > 1 &&
> > 	    (total_hung_task == sysctl_hung_task_panic) &&
> > 	    !sysctl_hung_task_warnings) {
> > 		pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
> > 			sysctl_hung_task_panic - 1);
> > 
> > Or we could print the "total_hung_task" counter somewhere, for
> > example,
> > 
> > 		pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
> > 			total_hung_task, ...
> > 
> > Or we could restart the for_each_process_thread() cycle and make sure
> > that all hung tasks will get reported.
> > 
> > Or we could ignore it until anyone complains.
> 
> It looks like we already inform the user when that happens. When
> sysctl_hung_task_warnings is finally decremented to zero, the code prints:
> 
> ```
> if (!sysctl_hung_task_warnings)
> 	pr_info("Future hung task reports are suppressed, see sysctl
> kernel.hung_task_warnings\n");
> ```
> 
> Given that this explicit warning is already in place, perhaps the current
> behavior is sufficient and clear enough?

The warning might get lost or it might happen long time before
critical stall so people might miss it.

But you are right. There is a warning. And my worries are rather
theoretical. Let's keep the code simple until anyone complains.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ