linux-kernel - Re: [PATCH v3 2/2] hung_task: Enable runtime reset of hung_task_detect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3773726f-f060-47a7-b99d-66ab0aa0d8b5@linux.dev>
Date: Wed, 17 Dec 2025 21:36:56 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Petr Mladek <pmladek@...e.com>
Cc: Aaron Tomlin <atomlin@...mlin.com>, sean@...e.io,
 linux-kernel@...r.kernel.org, mhiramat@...nel.org,
 akpm@...ux-foundation.org, gregkh@...uxfoundation.org
Subject: Re: [PATCH v3 2/2] hung_task: Enable runtime reset of
 hung_task_detect_count



On 2025/12/17 21:09, Petr Mladek wrote:
> On Wed 2025-12-17 15:31:25, Lance Yang wrote:
>> On 2025/12/16 11:00, Aaron Tomlin wrote:
>>> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
>>>
>>> Writing any value to this file atomically resets the counter of detected
>>> hung tasks to zero. This grants system administrators the ability to clear
>>> the cumulative diagnostic history after resolving an incident, simplifying
>>> monitoring without requiring a system restart.
>>
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -418,7 +418,7 @@ hung_task_detect_count
>>>    ======================
>>>    Indicates the total number of tasks that have been detected as hung since
>>> -the system boot.
>>> +the system boot. The counter can be reset to zero when written to.
>>>    This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>> index 5902573200c0..01ce46a107b0 100644
>>> --- a/kernel/hung_task.c
>>> +++ b/kernel/hung_task.c
>>> @@ -375,6 +375,31 @@ static long hung_timeout_jiffies(unsigned long last_checked,
>>>    }
>>>    #ifdef CONFIG_SYSCTL
>>> +
>>> +/**
>>> + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
>>> + * @table: Pointer to the struct ctl_table definition for this proc entry
>>> + * @write: Flag indicating the operation
>>> + * @buffer: User space buffer for data transfer
>>> + * @lenp: Pointer to the length of the data being transferred
>>> + * @ppos: Pointer to the current file offset
>>> + *
>>> + * This handler is used for reading the current hung task detection count
>>> + * and for resetting it to zero when a write operation is performed.
>>> + * Returns 0 on success or a negative error code on failure.
>>> + */
>>> +static int proc_dohung_task_detect_count(const struct ctl_table *table, int write,
>>> +					 void *buffer, size_t *lenp, loff_t *ppos)
>>> +{
>>> +	if (!write)
>>> +		return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
>>> +
>>> +	WRITE_ONCE(sysctl_hung_task_detect_count, 0);
>>
>> The reset uses WRITE_ONCE() but the increment in check_hung_task()
>> is plain ++, it's just a stat cunter so fine though ;)
> 
> I was about to wave this away as well. But I am afraid that
> it is even more complicated.

Good catch!

> 
> The counter is used to decide how many hung tasks were found in
> by a single check, see:
> 
> static void check_hung_uninterruptible_tasks(unsigned long timeout)
> {
> [...]
> 	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> [...]
> 	for_each_process_thread(g, t) {
> [...]
> 		check_hung_task(t, timeout, prev_detect_count);
> 	}
> [...]
> 	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> 		return;
> 
> 	if (need_warning || hung_task_call_panic) {
> 		si_mask |= SYS_INFO_LOCKS;
> 
> 		if (sysctl_hung_task_all_cpu_backtrace)
> 			si_mask |= SYS_INFO_ALL_BT;
> 	}
> 
> 	sys_info(si_mask);
> 
> 	if (hung_task_call_panic)
> 		panic("hung_task: blocked tasks");
> }
> 
> Any race might cause false positives and even panic() !!!
> And the race window is rather big (checking all processes).

Right, I completely missed the dependency on prev_detect_count.
Messing with that counter while the loop is running is really
dangerous ...

> 
> This race can't be prevented by an atomic read/write.
> IMHO, the only solution would be to add some locking,
> e.g. mutex and take it around the entire
> check_hung_uninterruptible_tasks().
> 
> On one hand, it might work. The code is called from
> a task context...
> 
> But adding a lock into a lockup-detector code triggers
> some warning bells in my head.
> 
> Honestly, I am not sure if it is worth it. I understand
> the motivation for the reset. But IMHO, even more important
> is to make sure that the watchdog works as expected.

Given that adding a lock to the detector path is undesirable,
the risk clearly outweighs the benefit :)

Thanks for saving me from this pitfall, Petr!