[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45c17b93-a6f1-45e0-8b25-20665a281949@linux.dev>
Date: Tue, 3 Feb 2026 11:08:33 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Aaron Tomlin <atomlin@...mlin.com>
Cc: neelx@...e.com, sean@...e.io, akpm@...ux-foundation.org,
mproche@...il.com, chjohnst@...il.com, nick.lange@...il.com,
linux-kernel@...r.kernel.org, mhiramat@...nel.org, joel.granados@...nel.org,
pmladek@...e.com, gregkh@...uxfoundation.org
Subject: Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise
detection count
On 2026/2/3 11:05, Lance Yang wrote:
>
>
> On 2026/1/25 21:58, Aaron Tomlin wrote:
>> The check_hung_task() function currently conflates two distinct
>> responsibilities: validating whether a task is hung and handling the
>> subsequent reporting (printing warnings, triggering panics, or
>> tracepoints).
>>
>> This patch refactors the logic by introducing hung_task_info(), a
>> function dedicated solely to reporting. The actual detection check,
>> task_is_hung(), is hoisted into the primary loop within
>> check_hung_uninterruptible_tasks(). This separation clearly decouples
>> the mechanism of detection from the policy of reporting.
>>
>> Furthermore, to facilitate future support for concurrent hung task
>> detection, the global sysctl_hung_task_detect_count variable is
>> converted from unsigned long to atomic_long_t. Consequently, the
>> counting logic is updated to accumulate the number of hung tasks locally
>> (this_round_count) during the iteration. The global counter is then
>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
>> concludes, rather than incrementally during the scan.
>>
>> These changes are strictly preparatory and introduce no functional
>> change to the system's runtime behaviour.
>>
>> Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
>> ---
>> kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>> 1 file changed, 33 insertions(+), 25 deletions(-)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index d2254c91450b..df10830ed9ef 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -36,7 +36,7 @@ static int __read_mostly
>> sysctl_hung_task_check_count = PID_MAX_LIMIT;
>> /*
>> * Total number of tasks detected as hung since boot:
>> */
>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>> +static atomic_long_t sysctl_hung_task_detect_count =
>> ATOMIC_LONG_INIT(0);
>> /*
>> * Limit number of tasks checked in a batch.
>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct
>> task_struct *task, unsigned long ti
>> }
>> #endif
>> -static void check_hung_task(struct task_struct *t, unsigned long
>> timeout,
>> - unsigned long prev_detect_count)
>> +/**
>> + * hung_task_info - Print diagnostic details for a hung task
>> + * @t: Pointer to the detected hung task.
>> + * @timeout: Timeout threshold for detecting hung tasks
>> + * @this_round_count: Count of hung tasks detected in the current
>> iteration
>> + *
>> + * Print structured information about the specified hung task, if
>> warnings
>> + * are enabled or if the panic batch threshold is exceeded.
>> + */
>> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
>> + unsigned long this_round_count)
>> {
>> - unsigned long total_hung_task;
>> -
>> - if (!task_is_hung(t, timeout))
>> - return;
>> -
>> - /*
>> - * This counter tracks the total number of tasks detected as hung
>> - * since boot.
>> - */
>> - sysctl_hung_task_detect_count++;
>
> Previously, the global detect count updated immediately when a hung task
> was found. BUT now, it only updates after the full scan finishes ...
>
> Ideally, the count should update as soon as possible, so that userspace
> can react in time :)
>
> For example, by migrating critical containers away from the node before
> the situation gets worse - something we already do.
Sorry, I should have said that earlier - just realized it ...
Powered by blists - more mailing lists