linux-kernel - Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0d12460-3ed8-43d4-8c0b-a7aa544d946e@linux.dev>
Date: Wed, 19 Nov 2025 01:57:36 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Petr Mladek <pmladek@...e.com>, Andrew Morton <akpm@...ux-foundation.org>
Cc: Feng Tang <feng.tang@...ux.alibaba.com>,
 Steven Rostedt <rostedt@...dmis.org>, Lance Yang <ioworker0@...il.com>,
 linux-kernel@...r.kernel.org, Jonathan Corbet <corbet@....net>,
 paulmck@...nel.org, lirongqing@...du.com, leonylgao@...cent.com
Subject: Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump
 sys info on task-hung



On 2025/11/18 23:20, Petr Mladek wrote:
> On Mon 2025-11-17 09:53:52, Andrew Morton wrote:
>> On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@...ux.alibaba.com> wrote:
>>
>>>>>    	if (need_warning || hung_task_call_panic) {
>>>>>    		si_mask |= SYS_INFO_LOCKS;
>>>>
>>>> Looks good to me now! I assume v3 would be expected, can you
>>>> post a new version?
>>>
>>> Andrew has taken the patchset to -mm tree.
>>>
>>> Andrew, which way do you prefer? I send a v3 patch for hung-task or you
>>> pickup the fixup patch and squash it into the orginal 0002 patch?
>>>
>>> Anyway, I make a squshed version v3 patch below.
>>
>> I prefer little fixup patches, generally.  So people can see what
>> changed and don't feel they should re-review everything.
>>
>> I queued the below, thanks.
>>
>> From: Feng Tang <feng.tang@...ux.alibaba.com>
>> Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
>> Date: Wed, 5 Nov 2025 19:30:36 +0800
>>
>> maintain consistecy established behavior, per Lance and Petr
>>
>> Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
>> Suggested-by: Petr Mladek <pmladek@...e.com>
>> Signed-off-by: Feng Tang <feng.tang@...ux.alibaba.com>
>> Cc: Jonathan Corbet <corbet@....net>
>> Cc: Lance Yang <ioworker0@...il.com>
>> Cc: "Paul E . McKenney" <paulmck@...nel.org>
>> Cc: Steven Rostedt <rostedt@...dmis.org>
>> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> 
> Thanks a lot for catching and fixing the regression caused
> by this patchset. The patch looks good.
> 
> See a comment below.
> 
>> --- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
>> +++ a/kernel/hung_task.c
>> @@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
>>   }
>>   #endif
>>   
>> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
>> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
>> +		unsigned long prev_detect_count)
>>   {
>> +	unsigned long total_hung_task;
>> +
>>   	if (!task_is_hung(t, timeout))
>>   		return;
>>   
>> @@ -234,13 +237,19 @@ static void check_hung_task(struct task_
>>   	 */
>>   	sysctl_hung_task_detect_count++;
>>   
>> +	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>>   	trace_sched_process_hang(t);
>>   
>> +	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
>> +		console_verbose();
>> +		hung_task_call_panic = true;
>> +	}
>> +
>>   	/*
>>   	 * Ok, the task did not get scheduled for more than 2 minutes,
>>   	 * complain:
>>   	 */
>> -	if (sysctl_hung_task_warnings) {
>> +	if (sysctl_hung_task_warnings || hung_task_call_panic) {
>>   		if (sysctl_hung_task_warnings > 0)
>>   			sysctl_hung_task_warnings--;
>>   		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> 
> This restores the behavior after the commit 9544f9e6947f6508
> ("hung_task: panic when there are more than N hung tasks at
> the same time"). It is better than nothing.
> 
> Well, the behavior is still not ideal. It would be better when
> we printed backtraces from _all_ "hung" tasks before panicking.
> But it prints the backtraces only when sysctl_hung_task_panic
> limit is reached.
> 
> I mean, for example, let's have:
> 
>    + sysctl_hung_task_warnings = 2;
>    + sysctl_hung_task_panic = 5;
>    + and detect 6 hung tasks.
> 
> The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
> because sysctl_hung_task_warnings reached 0. It will report 5th and
> 6th tasks because (total_hung_task >= 5).
> 
> It is better than nothing. But it might be confusing.

Right, I can see how it might be confusing.

IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
It makes sense that reports are suppressed after the limit is exhausted,
except when the sysctl_hung_task_panic threshold is reached ;)

> 
> I am not sure how to fix it. A minimalist solution would be to print
> a warning. Something like:
> 
> 	if (sysctl_hung_task_panic > 1 &&
> 	    (total_hung_task == sysctl_hung_task_panic) &&
> 	    !sysctl_hung_task_warnings) {
> 		pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
> 			sysctl_hung_task_panic - 1);
> 
> Or we could print the "total_hung_task" counter somewhere, for
> example,
> 
> 		pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
> 			total_hung_task, ...
> 
> Or we could restart the for_each_process_thread() cycle and make sure
> that all hung tasks will get reported.
> 
> Or we could ignore it until anyone complains.

It looks like we already inform the user when that happens. When
sysctl_hung_task_warnings is finally decremented to zero, the code prints:


```
if (!sysctl_hung_task_warnings)
	pr_info("Future hung task reports are suppressed, see sysctl 
kernel.hung_task_warnings\n");
```

Given that this explicit warning is already in place, perhaps the current
behavior is sufficient and clear enough?

Thanks,
Lance