lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45c17b93-a6f1-45e0-8b25-20665a281949@linux.dev>
Date: Tue, 3 Feb 2026 11:08:33 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Aaron Tomlin <atomlin@...mlin.com>
Cc: neelx@...e.com, sean@...e.io, akpm@...ux-foundation.org,
 mproche@...il.com, chjohnst@...il.com, nick.lange@...il.com,
 linux-kernel@...r.kernel.org, mhiramat@...nel.org, joel.granados@...nel.org,
 pmladek@...e.com, gregkh@...uxfoundation.org
Subject: Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise
 detection count



On 2026/2/3 11:05, Lance Yang wrote:
> 
> 
> On 2026/1/25 21:58, Aaron Tomlin wrote:
>> The check_hung_task() function currently conflates two distinct
>> responsibilities: validating whether a task is hung and handling the
>> subsequent reporting (printing warnings, triggering panics, or
>> tracepoints).
>>
>> This patch refactors the logic by introducing hung_task_info(), a
>> function dedicated solely to reporting. The actual detection check,
>> task_is_hung(), is hoisted into the primary loop within
>> check_hung_uninterruptible_tasks(). This separation clearly decouples
>> the mechanism of detection from the policy of reporting.
>>
>> Furthermore, to facilitate future support for concurrent hung task
>> detection, the global sysctl_hung_task_detect_count variable is
>> converted from unsigned long to atomic_long_t. Consequently, the
>> counting logic is updated to accumulate the number of hung tasks locally
>> (this_round_count) during the iteration. The global counter is then
>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
>> concludes, rather than incrementally during the scan.
>>
>> These changes are strictly preparatory and introduce no functional
>> change to the system's runtime behaviour.
>>
>> Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
>> ---
>>   kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>>   1 file changed, 33 insertions(+), 25 deletions(-)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index d2254c91450b..df10830ed9ef 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -36,7 +36,7 @@ static int __read_mostly 
>> sysctl_hung_task_check_count = PID_MAX_LIMIT;
>>   /*
>>    * Total number of tasks detected as hung since boot:
>>    */
>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>> +static atomic_long_t sysctl_hung_task_detect_count = 
>> ATOMIC_LONG_INIT(0);
>>   /*
>>    * Limit number of tasks checked in a batch.
>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct 
>> task_struct *task, unsigned long ti
>>   }
>>   #endif
>> -static void check_hung_task(struct task_struct *t, unsigned long 
>> timeout,
>> -        unsigned long prev_detect_count)
>> +/**
>> + * hung_task_info - Print diagnostic details for a hung task
>> + * @t: Pointer to the detected hung task.
>> + * @timeout: Timeout threshold for detecting hung tasks
>> + * @this_round_count: Count of hung tasks detected in the current 
>> iteration
>> + *
>> + * Print structured information about the specified hung task, if 
>> warnings
>> + * are enabled or if the panic batch threshold is exceeded.
>> + */
>> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
>> +               unsigned long this_round_count)
>>   {
>> -    unsigned long total_hung_task;
>> -
>> -    if (!task_is_hung(t, timeout))
>> -        return;
>> -
>> -    /*
>> -     * This counter tracks the total number of tasks detected as hung
>> -     * since boot.
>> -     */
>> -    sysctl_hung_task_detect_count++;
> 
> Previously, the global detect count updated immediately when a hung task
> was found. BUT now, it only updates after the full scan finishes ...
> 
> Ideally, the count should update as soon as possible, so that userspace
> can react in time :)
> 
> For example, by migrating critical containers away from the node before
> the situation gets worse - something we already do.

Sorry, I should have said that earlier - just realized it ...


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ