lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=Vr67+uRK2bYu34MDXRJN4w_VH_EO7OW4eVLJ3wqUUBog@mail.gmail.com>
Date: Mon, 15 Sep 2025 08:42:00 -0700
From: Doug Anderson <dianders@...omium.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Will Deacon <will@...nel.org>, Yunhui Cui <cuiyunhui@...edance.com>, akpm@...ux-foundation.org, 
	catalin.marinas@....com, maddy@...ux.ibm.com, mpe@...erman.id.au, 
	npiggin@...il.com, christophe.leroy@...roup.eu, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, 
	acme@...nel.org, namhyung@...nel.org, mark.rutland@....com, 
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com, 
	adrian.hunter@...el.com, kan.liang@...ux.intel.com, kees@...nel.org, 
	masahiroy@...nel.org, aliceryhl@...gle.com, ojeda@...nel.org, 
	thomas.weissschuh@...utronix.de, xur@...gle.com, ruanjinjie@...wei.com, 
	gshan@...hat.com, maz@...nel.org, suzuki.poulose@....com, 
	zhanjie9@...ilicon.com, yangyicong@...ilicon.com, gautam@...ux.ibm.com, 
	arnd@...db.de, zhao.xichao@...o.com, rppt@...nel.org, lihuafei1@...wei.com, 
	coxu@...hat.com, jpoimboe@...nel.org, yaozhenguo1@...il.com, 
	luogengkun@...weicloud.com, max.kellermann@...os.com, tj@...nel.org, 
	wangjinchao600@...il.com, yury.norov@...il.com, thorsten.blum@...ux.dev, 
	x86@...nel.org, linux-kernel@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org, 
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH] watchdog: remove HARDLOCKUP_DETECTOR_PERF

Hi,

On Mon, Sep 15, 2025 at 3:35 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Sep 15, 2025 at 11:26:09AM +0100, Will Deacon wrote:
>
> >   | If all CPUs are hard locked up at the same time the buddy system
> >   | can't detect it.
> >
> > Ok, so why is that limitation acceptable? It looks to me like you're
> > removing useful functionality.
>
> Yeah, this. I've run into this case waaay too many times to think it
> reasonable to remove the perf/NMI based lockup detector.

I am a bit curious how this comes to be in cases where you've seen it.
What causes all CPUs to be stuck looping all with interrupts disabled
(but still able to execute NMIs)? Certainly one can come up with a
synthetic way to make that happen, but I would imagine it to be
exceedingly rare in real life. Maybe all CPUs are deadlocked waiting
on spinlocks or something? There shouldn't be a lot of other reasons
that all CPUs should be stuck indefinitely with interrupts disabled...
If that's what's happening, (just spitballing) I wonder if hooking
into the slowpath of spinlocks to look for lockups would help? Maybe
every 10000 failures to acquire the spinlock we check for a lockup?
Obviously you could still come up with synthetic ways to make a
non-caught watchdog, but hopefully in those types of cases we can at
least reset the device with a hardware watchdog?

Overall the issue is that it's really awkward to have both types of
lockup detectors, especially since you've got to pick at compile time.
The perf lockup detector has a pile of things that make it pretty
awkward and it seems like people have been toward the buddy detector
because of this...

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ