linux-kernel - Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e1b7a63-967b-4004-ba14-ad46f1b3b271@linux.dev>
Date: Mon, 12 Jan 2026 22:43:35 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Petr Mladek <pmladek@...e.com>, Aaron Tomlin <atomlin@...mlin.com>
Cc: akpm@...ux-foundation.org, mhiramat@...nel.org,
 gregkh@...uxfoundation.org, joel.granados@...nel.org, sean@...e.io,
 linux-kernel@...r.kernel.org
Subject: Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of
 hung_task_detect_count



On 2026/1/12 21:13, Petr Mladek wrote:
> On Fri 2026-01-09 21:50:20, Lance Yang wrote:
>>
>>
>> On 2026/1/8 22:41, Petr Mladek wrote:
>>> On Tue 2025-12-30 19:41:25, Aaron Tomlin wrote:
>>>> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
>>>>
>>>> Writing a value of zero to this file atomically resets the counter of
>>>> detected hung tasks. This grants system administrators the ability to
>>>> clear the cumulative diagnostic history after resolving an incident,
>>>> simplifying monitoring without requiring a system restart.
>>>
>>>> --- a/kernel/hung_task.c
>>>> +++ b/kernel/hung_task.c
>>>> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>>>>     unlock:
>>>>    	rcu_read_unlock();
>>>> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
>>>> +	/* Ensures we see all hang details recorded during the scan. */
>>>> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>>>
>>> This value is read at the end of the scan => _release
>>> semantic/barrier should be here.
>>
>> Seems like _acquire is still correct here, because it is a load.
>>
>> _release semantics apply to stores, while _acquire on a load
>> ensures subsequent memory accesses are not reordered before it.
> 
> Right!
> 
>> Or smp_mb()?
>>
>> In the same thread, atomic operations on the same variable are not
>> reordered with respect to each other, even the _relaxed variant
>> preserves program order for that variable, IIRC.
>>
>> So the increment will always complete before the final read in
>> program order, and the read will see the updated value (unless
>> another CPU resets it concurrently, which is a logical race, not
>> a reordering issue).
>>
>> So, it would be:
>>
>>    prev = atomic_long_read_acquire(&counter);     // scan start
>>    ...
>>    cur = atomic_long_inc_return_relaxed(&counter); // during scan
>>    ...
>>    cur = atomic_long_read_acquire(&counter);      // scan end
>>
>> The first _acquire ensures no task-checking code is reordered
>> before the start read, the middle increment is just atomic
>> without extra barriers, and the final _acquire makes sure we
>> observe all hang details before computing the delta.
> 
> The acquire/relaxed/acquire semantic looks weird. The problem is
> that we do not use the counter as a lock.
> 
> I thought about a sane approach and the following came to my
> mind:
> 
>  From c28b74c35d653f527aa9017c32630ad08180fb4e Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@...e.com>
> Date: Mon, 12 Jan 2026 14:00:52 +0100
> Subject: [POC] hung_task: Update the global counter using a proper
>   acquire/release semantic
> 
> The global counter of hung tasks might get reset when the check
> is in progress. Also the number of hung tasks detected in the current
> round is important to decide whether panic() is needed or not.
> 
> Handle races by:
> 
> 1. Remember the total counter at the beginnning of the check.
> 2. Count the current round in a local variable.
> 3. Udpate the total counter only when the value has not been modified
>     during the check.

Cool!

> 
> Note that this is only compile tested.
> 
> Signed-off-by: Petr Mladek <pmladek@...e.com>
> ---
>   kernel/hung_task.c | 53 +++++++++++++++++-----------------------------
>   1 file changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 3bc72a4e4032..c939cd3d8a2c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -246,30 +246,12 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>   	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
>   }
>   
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -			    unsigned long prev_detect_count)
> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
> +			   unsigned long this_round_count)
>   {
> -	unsigned long total_hung_task, cur_detect_count;
> -
> -	if (!task_is_hung(t, timeout))
> -		return;
> -
> -	/*
> -	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot. If a reset occurred during the scan, we treat the
> -	 * current count as the new delta to avoid an underflow error.
> -	 * Ensure hang details are globally visible before the counter
> -	 * update.
> -	 */
> -	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
> -	if (cur_detect_count >= prev_detect_count)
> -		total_hung_task = cur_detect_count - prev_detect_count;
> -	else
> -		total_hung_task = cur_detect_count;
> -
>   	trace_sched_process_hang(t);
>   
> -	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> +	if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) {
>   		console_verbose();
>   		hung_task_call_panic = true;
>   	}
> @@ -325,12 +307,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	int max_count = sysctl_hung_task_check_count;
>   	unsigned long last_break = jiffies;
>   	struct task_struct *g, *t;
> -	unsigned long cur_detect_count, prev_detect_count, delta;
> +	unsigned long total_count, this_round_count;
>   	int need_warning = sysctl_hung_task_warnings;
>   	unsigned long si_mask = hung_task_si_mask;
>   
> -	/* Acquire prevents reordering task checks before this point. */
> -	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +	/* The counter might get reset. Remember the initial value. */
> +	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +
>   	/*
>   	 * If the system crashed already then all bets are off,
>   	 * do not report extra hung tasks:
> @@ -339,6 +322,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   		return;
>   
>   
> +	this_round_count = 0UL;
>   	rcu_read_lock();
>   	for_each_process_thread(g, t) {
>   
> @@ -350,21 +334,24 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   			last_break = jiffies;
>   		}
>   
> -		check_hung_task(t, timeout, prev_detect_count);
> +		if (task_is_hung(t, timeout)) {
> +			this_round_count++;
> +			hung_task_info(t, timeout, this_round_count);
> +		}
>   	}
>    unlock:
>   	rcu_read_unlock();
>   
> -	/* Ensures we see all hang details recorded during the scan. */
> -	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> -	if (cur_detect_count < prev_detect_count)
> -		delta = cur_detect_count;
> -	else
> -		delta = cur_detect_count - prev_detect_count;
> -
> -	if (!delta)
> +	if (!this_round_count)
>   		return;
>   
> +	/*
> +	 * Do not count this round when the global counter has been reset
> +	 * during this check.
> +	 */
> +	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count, total_count,
> +				    total_count + this_round_count);
> +
>   	if (need_warning || hung_task_call_panic) {
>   		si_mask |= SYS_INFO_LOCKS;
>   

In general, the POC makes a lot of sense ;)

The acquire/release pairing is now straightforward:
- _read_acquire at start
- local counting (no atomics)
- _cmpxchg_release at end

And yeah, if cmpxchg fails, we simply do not count this round,
which is reasonable behavior.

So, Aaron, can you take care of refactoring/testing based on
Petr's approach?

Cheers,
Lance