linux-kernel - RE: [外部邮件] Re: [????] Re: [PATCH][RFC] hung_task: Support to panic when the maximum number of hung task warnings is reached

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d334c33bc11243cd9ab31ebe8e4310ca@baidu.com>
Date: Tue, 23 Sep 2025 06:16:03 +0000
From: "Li,Rongqing" <lirongqing@...du.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: Andrew Morton <akpm@...ux-foundation.org>, "corbet@....net"
	<corbet@....net>, "lance.yang@...ux.dev" <lance.yang@...ux.dev>,
	"mhiramat@...nel.org" <mhiramat@...nel.org>,
	"pawan.kumar.gupta@...ux.intel.com" <pawan.kumar.gupta@...ux.intel.com>,
	"mingo@...nel.org" <mingo@...nel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
	"kees@...nel.org" <kees@...nel.org>, "arnd@...db.de" <arnd@...db.de>,
	"feng.tang@...ux.alibaba.com" <feng.tang@...ux.alibaba.com>,
	"pauld@...hat.com" <pauld@...hat.com>, "joel.granados@...nel.org"
	<joel.granados@...nel.org>, "linux-doc@...r.kernel.org"
	<linux-doc@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [外部邮件] Re: [????] Re: [PATCH][RFC] hung_task: Support to panic when the maximum number of hung task warnings is reached



> -----Original Message-----
> From: Paul E. McKenney <paulmck@...nel.org>
> Sent: 2025年9月23日 14:04
> To: Li,Rongqing <lirongqing@...du.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>; corbet@....net;
> lance.yang@...ux.dev; mhiramat@...nel.org;
> pawan.kumar.gupta@...ux.intel.com; mingo@...nel.org;
> dave.hansen@...ux.intel.com; rostedt@...dmis.org; kees@...nel.org;
> arnd@...db.de; feng.tang@...ux.alibaba.com; pauld@...hat.com;
> joel.granados@...nel.org; linux-doc@...r.kernel.org;
> linux-kernel@...r.kernel.org
> Subject: [外部邮件] Re: [????] Re: [PATCH][RFC] hung_task: Support to panic
> when the maximum number of hung task warnings is reached
> 
> On Tue, Sep 23, 2025 at 04:00:03AM +0000, Li,Rongqing wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Morton <akpm@...ux-foundation.org>
> > > Sent: 2025年9月23日 11:46
> > > To: Li,Rongqing <lirongqing@...du.com>
> > > Cc: corbet@....net; lance.yang@...ux.dev; mhiramat@...nel.org;
> > > paulmck@...nel.org; pawan.kumar.gupta@...ux.intel.com;
> > > mingo@...nel.org; dave.hansen@...ux.intel.com; rostedt@...dmis.org;
> > > kees@...nel.org; arnd@...db.de; feng.tang@...ux.alibaba.com;
> > > pauld@...hat.com; joel.granados@...nel.org;
> > > linux-doc@...r.kernel.org; linux-kernel@...r.kernel.org
> > > Subject: [????] Re: [PATCH][RFC] hung_task: Support to panic when
> > > the maximum number of hung task warnings is reached
> > >
> > > On Tue, 23 Sep 2025 11:37:40 +0800 lirongqing <lirongqing@...du.com>
> wrote:
> > >
> > > > Currently the hung task detector can either panic immediately or
> > > > continue operation when hung tasks are detected. However, there
> > > > are scenarios where we want a more balanced approach:
> > > >
> > > > - We don't want the system to panic immediately when a few hung tasks
> > > >   are detected, as the system may be able to recover
> > > > - And we also don't want the system to stall indefinitely with multiple
> > > >   hung tasks
> > > >
> > > > This commit introduces a new mode (value 2) for the hung task panic
> behavior.
> > > > When set to 2, the system will panic only after the maximum number
> > > > of hung task warnings (hung_task_warnings) has been reached.
> > > >
> > > > This provides a middle ground between immediate panic and
> > > > potentially infinite stall, allowing for automated vmcore
> > > > generation after a reasonable
> > >
> > > I assume the same argument applies to the NMI watchdog, to the
> > > softlockup detector and to the RCU stall detector?
> >
> > True, especial RCU stall detector
> 
> There are the panic_on_rcu_stall and max_rcu_stall_to_panic sysctls, which
> together allow you to panic after (say) three RCU CPU stall warnings.
> Does those do what you need?



Yes, this is what I need. RCU has been implemented.

Thanks


-Li

> 
> 							Thanx, Paul
> 
> > > A general framework to handle all of these might be better.  But why
> > > do it in kernel at all?  What about a userspace detector which
> > > parses kernel logs (or new procfs counters) and makes such decisions?
> >
> >
> > By leveraging existing kernel mechanisms, implementation in kernel is
> > very simple and reliable, I think
> >
> > Thanks
> >
> > -Li
> >