[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c4846caa-3979-4207-96bf-76d2b4ae20cc@linux.dev>
Date: Tue, 3 Feb 2026 19:01:01 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Petr Mladek <pmladek@...e.com>
Cc: Aaron Tomlin <atomlin@...mlin.com>, neelx@...e.com, sean@...e.io,
akpm@...ux-foundation.org, mproche@...il.com, chjohnst@...il.com,
nick.lange@...il.com, linux-kernel@...r.kernel.org, mhiramat@...nel.org,
joel.granados@...nel.org, gregkh@...uxfoundation.org
Subject: Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise
detection count
On 2026/2/3 17:03, Petr Mladek wrote:
> On Tue 2026-02-03 11:08:33, Lance Yang wrote:
>> On 2026/2/3 11:05, Lance Yang wrote:
>>> On 2026/1/25 21:58, Aaron Tomlin wrote:
>>>> The check_hung_task() function currently conflates two distinct
>>>> responsibilities: validating whether a task is hung and handling the
>>>> subsequent reporting (printing warnings, triggering panics, or
>>>> tracepoints).
>>>>
>>>> This patch refactors the logic by introducing hung_task_info(), a
>>>> function dedicated solely to reporting. The actual detection check,
>>>> task_is_hung(), is hoisted into the primary loop within
>>>> check_hung_uninterruptible_tasks(). This separation clearly decouples
>>>> the mechanism of detection from the policy of reporting.
>>>>
>>>> Furthermore, to facilitate future support for concurrent hung task
>>>> detection, the global sysctl_hung_task_detect_count variable is
>>>> converted from unsigned long to atomic_long_t. Consequently, the
>>>> counting logic is updated to accumulate the number of hung tasks locally
>>>> (this_round_count) during the iteration. The global counter is then
>>>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
>>>> concludes, rather than incrementally during the scan.
>>>>
>>>> These changes are strictly preparatory and introduce no functional
>>>> change to the system's runtime behaviour.
>>>>
>>>> Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
>>>> ---
>>>> kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>>>> 1 file changed, 33 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>>> index d2254c91450b..df10830ed9ef 100644
>>>> --- a/kernel/hung_task.c
>>>> +++ b/kernel/hung_task.c
>>>> @@ -36,7 +36,7 @@ static int __read_mostly
>>>> sysctl_hung_task_check_count = PID_MAX_LIMIT;
>>>> /*
>>>> * Total number of tasks detected as hung since boot:
>>>> */
>>>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>>>> +static atomic_long_t sysctl_hung_task_detect_count =
>>>> ATOMIC_LONG_INIT(0);
>>>> /*
>>>> * Limit number of tasks checked in a batch.
>>>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct
>>>> task_struct *task, unsigned long ti
>>>> }
>>>> #endif
>>>> -static void check_hung_task(struct task_struct *t, unsigned long
>>>> timeout,
>>>> - unsigned long prev_detect_count)
>>>> +/**
>>>> + * hung_task_info - Print diagnostic details for a hung task
>>>> + * @t: Pointer to the detected hung task.
>>>> + * @timeout: Timeout threshold for detecting hung tasks
>>>> + * @this_round_count: Count of hung tasks detected in the current
>>>> iteration
>>>> + *
>>>> + * Print structured information about the specified hung task, if
>>>> warnings
>>>> + * are enabled or if the panic batch threshold is exceeded.
>>>> + */
>>>> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
>>>> + unsigned long this_round_count)
>>>> {
>>>> - unsigned long total_hung_task;
>>>> -
>>>> - if (!task_is_hung(t, timeout))
>>>> - return;
>>>> -
>>>> - /*
>>>> - * This counter tracks the total number of tasks detected as hung
>>>> - * since boot.
>>>> - */
>>>> - sysctl_hung_task_detect_count++;
>>>
>>> Previously, the global detect count updated immediately when a hung task
>>> was found. BUT now, it only updates after the full scan finishes ...
>>>
>>> Ideally, the count should update as soon as possible, so that userspace
>>> can react in time :)
>>>
>>> For example, by migrating critical containers away from the node before
>>> the situation gets worse - something we already do.
>>
>> Sorry, I should have said that earlier - just realized it ...
>
> Better late then sorry ;-)
;P
>
> That said, is the delay really critical? I guess that the userspace
> checks the counter in regular intervals (seconds or tens of seconds).
> Or is there any way to get a notification immediately?
Just rely on polling the counter every 0.x seconds.
I don't think that the full scan would take many seconds, but reporting
(e.g. pr_err) could be slow somehow ...
>
> Anyway, I thought how the counting and barriers might work when
> we update the global counter immediately. And I came up with
> the following:
Nice! That should be doing the right thing.
>
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..8bc043fbe89c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> int max_count = sysctl_hung_task_check_count;
> unsigned long last_break = jiffies;
> struct task_struct *g, *t;
> - unsigned long total_count, this_round_count;
> + unsigned long this_round_count;
> int need_warning = sysctl_hung_task_warnings;
> unsigned long si_mask = hung_task_si_mask;
>
> - /*
> - * The counter might get reset. Remember the initial value.
> - * Acquire prevents reordering task checks before this point.
> - */
> - total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> /*
> * If the system crashed already then all bets are off,
> * do not report extra hung tasks:
> @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> }
>
> if (task_is_hung(t, timeout)) {
> + /*
> + * Increment the global counter so that userspace could
> + * start migrating tasks ASAP. But count the current
> + * round separately because userspace could reset
> + * the global counter at any time.
> + */
> + atomic_long_inc(&sysctl_hung_task_detect_count);
Atomic increment with relaxed ordering, which is good enough and works
well, IIUC.
> this_round_count++;
> hung_task_info(t, timeout, this_round_count);
> }
> @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> if (!this_round_count)
> return;
>
> - /*
> - * Do not count this round when the global counter has been reset
> - * during this check. Release ensures we see all hang details
> - * recorded during the scan.
> - */
> - atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
> - total_count, total_count +
> - this_round_count);
> -
> if (need_warning || hung_task_call_panic) {
> si_mask |= SYS_INFO_LOCKS;
>
>
> I am not sure of the comment above the increment is needed.
> Well, it might help anyone to understand the motivation without
> digging in the git log history.
Looks good to me. Could you send it as a follow-up patch?
Cheers,
Lance
Powered by blists - more mailing lists