linux-kernel - Re: [PATCH v3 2/2] hung_task: Enable runtime reset of hung_task_detect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aUKresftPnbndSBo@pathway.suse.cz>
Date: Wed, 17 Dec 2025 14:09:14 +0100
From: Petr Mladek <pmladek@...e.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Aaron Tomlin <atomlin@...mlin.com>, sean@...e.io,
	linux-kernel@...r.kernel.org, mhiramat@...nel.org,
	akpm@...ux-foundation.org, gregkh@...uxfoundation.org
Subject: Re: [PATCH v3 2/2] hung_task: Enable runtime reset of
 hung_task_detect_count

On Wed 2025-12-17 15:31:25, Lance Yang wrote:
> On 2025/12/16 11:00, Aaron Tomlin wrote:
> > Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> > 
> > Writing any value to this file atomically resets the counter of detected
> > hung tasks to zero. This grants system administrators the ability to clear
> > the cumulative diagnostic history after resolving an incident, simplifying
> > monitoring without requiring a system restart.
> 
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -418,7 +418,7 @@ hung_task_detect_count
> >   ======================
> >   Indicates the total number of tasks that have been detected as hung since
> > -the system boot.
> > +the system boot. The counter can be reset to zero when written to.
> >   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index 5902573200c0..01ce46a107b0 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -375,6 +375,31 @@ static long hung_timeout_jiffies(unsigned long last_checked,
> >   }
> >   #ifdef CONFIG_SYSCTL
> > +
> > +/**
> > + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
> > + * @table: Pointer to the struct ctl_table definition for this proc entry
> > + * @write: Flag indicating the operation
> > + * @buffer: User space buffer for data transfer
> > + * @lenp: Pointer to the length of the data being transferred
> > + * @ppos: Pointer to the current file offset
> > + *
> > + * This handler is used for reading the current hung task detection count
> > + * and for resetting it to zero when a write operation is performed.
> > + * Returns 0 on success or a negative error code on failure.
> > + */
> > +static int proc_dohung_task_detect_count(const struct ctl_table *table, int write,
> > +					 void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	if (!write)
> > +		return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
> > +
> > +	WRITE_ONCE(sysctl_hung_task_detect_count, 0);
> 
> The reset uses WRITE_ONCE() but the increment in check_hung_task()
> is plain ++, it's just a stat cunter so fine though ;)

I was about to wave this away as well. But I am afraid that
it is even more complicated.

The counter is used to decide how many hung tasks were found in
by a single check, see:

static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
[...]
	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
[...]
	for_each_process_thread(g, t) {
[...]
		check_hung_task(t, timeout, prev_detect_count);
	}
[...]
	if (!(sysctl_hung_task_detect_count - prev_detect_count))
		return;

	if (need_warning || hung_task_call_panic) {
		si_mask |= SYS_INFO_LOCKS;

		if (sysctl_hung_task_all_cpu_backtrace)
			si_mask |= SYS_INFO_ALL_BT;
	}

	sys_info(si_mask);

	if (hung_task_call_panic)
		panic("hung_task: blocked tasks");
}

Any race might cause false positives and even panic() !!!
And the race window is rather big (checking all processes).

This race can't be prevented by an atomic read/write.
IMHO, the only solution would be to add some locking,
e.g. mutex and take it around the entire
check_hung_uninterruptible_tasks().

On one hand, it might work. The code is called from
a task context...

But adding a lock into a lockup-detector code triggers
some warning bells in my head.

Honestly, I am not sure if it is worth it. I understand
the motivation for the reset. But IMHO, even more important
is to make sure that the watchdog works as expected.

Best Regards,
Petr