[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9067a88d-f5df-4d6e-b3b3-2e266ebcf3d0@linux.dev>
Date: Tue, 23 Sep 2025 11:59:30 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: lirongqing <lirongqing@...du.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: corbet@....net, mhiramat@...nel.org, paulmck@...nel.org,
pawan.kumar.gupta@...ux.intel.com, mingo@...nel.org,
dave.hansen@...ux.intel.com, rostedt@...dmis.org, kees@...nel.org,
arnd@...db.de, feng.tang@...ux.alibaba.com, pauld@...hat.com,
joel.granados@...nel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH][RFC] hung_task: Support to panic when the maximum number
of hung task warnings is reached
On 2025/9/23 11:45, Andrew Morton wrote:
> On Tue, 23 Sep 2025 11:37:40 +0800 lirongqing <lirongqing@...du.com> wrote:
>
>> Currently the hung task detector can either panic immediately or continue
>> operation when hung tasks are detected. However, there are scenarios
>> where we want a more balanced approach:
>>
>> - We don't want the system to panic immediately when a few hung tasks
>> are detected, as the system may be able to recover
>> - And we also don't want the system to stall indefinitely with multiple
>> hung tasks
>>
>> This commit introduces a new mode (value 2) for the hung task panic behavior.
>> When set to 2, the system will panic only after the maximum number of hung
>> task warnings (hung_task_warnings) has been reached.
>>
>> This provides a middle ground between immediate panic and potentially
>> infinite stall, allowing for automated vmcore generation after a reasonable
>
> I assume the same argument applies to the NMI watchdog, to the
> softlockup detector and to the RCU stall detector?
>
> A general framework to handle all of these might be better. But why do
> it in kernel at all? What about a userspace detector which parses
> kernel logs (or new procfs counters) and makes such decisions?
+1. I agree that a userspace detector seems more appropriate for this.
We already have the hung_task_detect_count counter, so a userspace
detector could easily use that to implement custom policies ;)
Powered by blists - more mailing lists