[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aUKresftPnbndSBo@pathway.suse.cz>
Date: Wed, 17 Dec 2025 14:09:14 +0100
From: Petr Mladek <pmladek@...e.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Aaron Tomlin <atomlin@...mlin.com>, sean@...e.io,
linux-kernel@...r.kernel.org, mhiramat@...nel.org,
akpm@...ux-foundation.org, gregkh@...uxfoundation.org
Subject: Re: [PATCH v3 2/2] hung_task: Enable runtime reset of
hung_task_detect_count
On Wed 2025-12-17 15:31:25, Lance Yang wrote:
> On 2025/12/16 11:00, Aaron Tomlin wrote:
> > Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> >
> > Writing any value to this file atomically resets the counter of detected
> > hung tasks to zero. This grants system administrators the ability to clear
> > the cumulative diagnostic history after resolving an incident, simplifying
> > monitoring without requiring a system restart.
>
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -418,7 +418,7 @@ hung_task_detect_count
> > ======================
> > Indicates the total number of tasks that have been detected as hung since
> > -the system boot.
> > +the system boot. The counter can be reset to zero when written to.
> > This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index 5902573200c0..01ce46a107b0 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -375,6 +375,31 @@ static long hung_timeout_jiffies(unsigned long last_checked,
> > }
> > #ifdef CONFIG_SYSCTL
> > +
> > +/**
> > + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
> > + * @table: Pointer to the struct ctl_table definition for this proc entry
> > + * @write: Flag indicating the operation
> > + * @buffer: User space buffer for data transfer
> > + * @lenp: Pointer to the length of the data being transferred
> > + * @ppos: Pointer to the current file offset
> > + *
> > + * This handler is used for reading the current hung task detection count
> > + * and for resetting it to zero when a write operation is performed.
> > + * Returns 0 on success or a negative error code on failure.
> > + */
> > +static int proc_dohung_task_detect_count(const struct ctl_table *table, int write,
> > + void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > + if (!write)
> > + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
> > +
> > + WRITE_ONCE(sysctl_hung_task_detect_count, 0);
>
> The reset uses WRITE_ONCE() but the increment in check_hung_task()
> is plain ++, it's just a stat cunter so fine though ;)
I was about to wave this away as well. But I am afraid that
it is even more complicated.
The counter is used to decide how many hung tasks were found in
by a single check, see:
static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
[...]
unsigned long prev_detect_count = sysctl_hung_task_detect_count;
[...]
for_each_process_thread(g, t) {
[...]
check_hung_task(t, timeout, prev_detect_count);
}
[...]
if (!(sysctl_hung_task_detect_count - prev_detect_count))
return;
if (need_warning || hung_task_call_panic) {
si_mask |= SYS_INFO_LOCKS;
if (sysctl_hung_task_all_cpu_backtrace)
si_mask |= SYS_INFO_ALL_BT;
}
sys_info(si_mask);
if (hung_task_call_panic)
panic("hung_task: blocked tasks");
}
Any race might cause false positives and even panic() !!!
And the race window is rather big (checking all processes).
This race can't be prevented by an atomic read/write.
IMHO, the only solution would be to add some locking,
e.g. mutex and take it around the entire
check_hung_uninterruptible_tasks().
On one hand, it might work. The code is called from
a task context...
But adding a lock into a lockup-detector code triggers
some warning bells in my head.
Honestly, I am not sure if it is worth it. I understand
the motivation for the reset. But IMHO, even more important
is to make sure that the watchdog works as expected.
Best Regards,
Petr
Powered by blists - more mailing lists