linux-kernel - Re: [PATCH] watchdog: remove HARDLOCKUP_DETECTOR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7366f87a-1924-4dac-8945-389e6674213f@gmail.com>
Date: Tue, 16 Sep 2025 09:46:53 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Doug Anderson <dianders@...omium.org>,
 Peter Zijlstra <peterz@...radead.org>
Cc: Will Deacon <will@...nel.org>, Yunhui Cui <cuiyunhui@...edance.com>,
 akpm@...ux-foundation.org, catalin.marinas@....com, maddy@...ux.ibm.com,
 mpe@...erman.id.au, npiggin@...il.com, christophe.leroy@...roup.eu,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, hpa@...or.com, acme@...nel.org,
 namhyung@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
 adrian.hunter@...el.com, kan.liang@...ux.intel.com, kees@...nel.org,
 masahiroy@...nel.org, aliceryhl@...gle.com, ojeda@...nel.org,
 thomas.weissschuh@...utronix.de, xur@...gle.com, ruanjinjie@...wei.com,
 gshan@...hat.com, maz@...nel.org, suzuki.poulose@....com,
 zhanjie9@...ilicon.com, yangyicong@...ilicon.com, gautam@...ux.ibm.com,
 arnd@...db.de, zhao.xichao@...o.com, rppt@...nel.org, lihuafei1@...wei.com,
 coxu@...hat.com, jpoimboe@...nel.org, yaozhenguo1@...il.com,
 luogengkun@...weicloud.com, max.kellermann@...os.com, tj@...nel.org,
 yury.norov@...il.com, thorsten.blum@...ux.dev, x86@...nel.org,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linuxppc-dev@...ts.ozlabs.org, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH] watchdog: remove HARDLOCKUP_DETECTOR_PERF

On 9/15/25 23:42, Doug Anderson wrote:
> Hi,
> 
> On Mon, Sep 15, 2025 at 3:35 AM Peter Zijlstra <peterz@...radead.org> wrote:
>>
>> On Mon, Sep 15, 2025 at 11:26:09AM +0100, Will Deacon wrote:
>>
>>>    | If all CPUs are hard locked up at the same time the buddy system
>>>    | can't detect it.
>>>
>>> Ok, so why is that limitation acceptable? It looks to me like you're
>>> removing useful functionality.
>>
>> Yeah, this. I've run into this case waaay too many times to think it
>> reasonable to remove the perf/NMI based lockup detector.
> 
> I am a bit curious how this comes to be in cases where you've seen it.
> What causes all CPUs to be stuck looping all with interrupts disabled
> (but still able to execute NMIs)? Certainly one can come up with a
> synthetic way to make that happen, but I would imagine it to be
> exceedingly rare in real life. Maybe all CPUs are deadlocked waiting
> on spinlocks or something? There shouldn't be a lot of other reasons
> that all CPUs should be stuck indefinitely with interrupts disabled...
> If that's what's happening, (just spitballing) I wonder if hooking
> into the slowpath of spinlocks to look for lockups would help? Maybe
> every 10000 failures to acquire the spinlock we check for a lockup?
> Obviously you could still come up with synthetic ways to make a
> non-caught watchdog, but hopefully in those types of cases we can at
> least reset the device with a hardware watchdog?
> 
> Overall the issue is that it's really awkward to have both types of
> lockup detectors, especially since you've got to pick at compile time.
> The perf lockup detector has a pile of things that make it pretty
> awkward and it seems like people have been toward the buddy detector
> because of this...
> 
> -Doug

Should we support both modularization and changing the backend after 
boot, so that the user has the choice?

-- 
Jinchao