[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7366f87a-1924-4dac-8945-389e6674213f@gmail.com>
Date: Tue, 16 Sep 2025 09:46:53 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Doug Anderson <dianders@...omium.org>,
Peter Zijlstra <peterz@...radead.org>
Cc: Will Deacon <will@...nel.org>, Yunhui Cui <cuiyunhui@...edance.com>,
akpm@...ux-foundation.org, catalin.marinas@....com, maddy@...ux.ibm.com,
mpe@...erman.id.au, npiggin@...il.com, christophe.leroy@...roup.eu,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, acme@...nel.org,
namhyung@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
adrian.hunter@...el.com, kan.liang@...ux.intel.com, kees@...nel.org,
masahiroy@...nel.org, aliceryhl@...gle.com, ojeda@...nel.org,
thomas.weissschuh@...utronix.de, xur@...gle.com, ruanjinjie@...wei.com,
gshan@...hat.com, maz@...nel.org, suzuki.poulose@....com,
zhanjie9@...ilicon.com, yangyicong@...ilicon.com, gautam@...ux.ibm.com,
arnd@...db.de, zhao.xichao@...o.com, rppt@...nel.org, lihuafei1@...wei.com,
coxu@...hat.com, jpoimboe@...nel.org, yaozhenguo1@...il.com,
luogengkun@...weicloud.com, max.kellermann@...os.com, tj@...nel.org,
yury.norov@...il.com, thorsten.blum@...ux.dev, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH] watchdog: remove HARDLOCKUP_DETECTOR_PERF
On 9/15/25 23:42, Doug Anderson wrote:
> Hi,
>
> On Mon, Sep 15, 2025 at 3:35 AM Peter Zijlstra <peterz@...radead.org> wrote:
>>
>> On Mon, Sep 15, 2025 at 11:26:09AM +0100, Will Deacon wrote:
>>
>>> | If all CPUs are hard locked up at the same time the buddy system
>>> | can't detect it.
>>>
>>> Ok, so why is that limitation acceptable? It looks to me like you're
>>> removing useful functionality.
>>
>> Yeah, this. I've run into this case waaay too many times to think it
>> reasonable to remove the perf/NMI based lockup detector.
>
> I am a bit curious how this comes to be in cases where you've seen it.
> What causes all CPUs to be stuck looping all with interrupts disabled
> (but still able to execute NMIs)? Certainly one can come up with a
> synthetic way to make that happen, but I would imagine it to be
> exceedingly rare in real life. Maybe all CPUs are deadlocked waiting
> on spinlocks or something? There shouldn't be a lot of other reasons
> that all CPUs should be stuck indefinitely with interrupts disabled...
> If that's what's happening, (just spitballing) I wonder if hooking
> into the slowpath of spinlocks to look for lockups would help? Maybe
> every 10000 failures to acquire the spinlock we check for a lockup?
> Obviously you could still come up with synthetic ways to make a
> non-caught watchdog, but hopefully in those types of cases we can at
> least reset the device with a hardware watchdog?
>
> Overall the issue is that it's really awkward to have both types of
> lockup detectors, especially since you've got to pick at compile time.
> The perf lockup detector has a pile of things that make it pretty
> awkward and it seems like people have been toward the buddy detector
> because of this...
>
> -Doug
Should we support both modularization and changing the backend after
boot, so that the user has the choice?
--
Jinchao
Powered by blists - more mailing lists