lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250916074217.GF3245006@noisy.programming.kicks-ass.net>
Date: Tue, 16 Sep 2025 09:42:17 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Will Deacon <will@...nel.org>, Yunhui Cui <cuiyunhui@...edance.com>,
	akpm@...ux-foundation.org, catalin.marinas@....com,
	maddy@...ux.ibm.com, mpe@...erman.id.au, npiggin@...il.com,
	christophe.leroy@...roup.eu, tglx@...utronix.de, mingo@...hat.com,
	bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
	acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	kan.liang@...ux.intel.com, kees@...nel.org, masahiroy@...nel.org,
	aliceryhl@...gle.com, ojeda@...nel.org,
	thomas.weissschuh@...utronix.de, xur@...gle.com,
	ruanjinjie@...wei.com, gshan@...hat.com, maz@...nel.org,
	suzuki.poulose@....com, zhanjie9@...ilicon.com,
	yangyicong@...ilicon.com, gautam@...ux.ibm.com, arnd@...db.de,
	zhao.xichao@...o.com, rppt@...nel.org, lihuafei1@...wei.com,
	coxu@...hat.com, jpoimboe@...nel.org, yaozhenguo1@...il.com,
	luogengkun@...weicloud.com, max.kellermann@...os.com, tj@...nel.org,
	wangjinchao600@...il.com, yury.norov@...il.com,
	thorsten.blum@...ux.dev, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linuxppc-dev@...ts.ozlabs.org, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH] watchdog: remove HARDLOCKUP_DETECTOR_PERF

On Mon, Sep 15, 2025 at 08:42:00AM -0700, Doug Anderson wrote:
> On Mon, Sep 15, 2025 at 3:35 AM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Mon, Sep 15, 2025 at 11:26:09AM +0100, Will Deacon wrote:
> >
> > >   | If all CPUs are hard locked up at the same time the buddy system
> > >   | can't detect it.
> > >
> > > Ok, so why is that limitation acceptable? It looks to me like you're
> > > removing useful functionality.
> >
> > Yeah, this. I've run into this case waaay too many times to think it
> > reasonable to remove the perf/NMI based lockup detector.
> 
> I am a bit curious how this comes to be in cases where you've seen it.
> What causes all CPUs to be stuck looping all with interrupts disabled
> (but still able to execute NMIs)? Certainly one can come up with a
> synthetic way to make that happen, but I would imagine it to be
> exceedingly rare in real life. Maybe all CPUs are deadlocked waiting
> on spinlocks or something? There shouldn't be a lot of other reasons
> that all CPUs should be stuck indefinitely with interrupts disabled...

The simplest one I often run into is rq->lock getting stuck and then all
the other CPUs piling up on that in various ways.

Getting stop_machine() stuck is also a fun one.

I mean, it really isn't that hard. If, as a full time kernel dev, you
don't get into this situation at least a few time a year, you're just
not doing your job right ;-)

> If that's what's happening, (just spitballing) I wonder if hooking
> into the slowpath of spinlocks to look for lockups would help? Maybe
> every 10000 failures to acquire the spinlock we check for a lockup?
> Obviously you could still come up with synthetic ways to make a
> non-caught watchdog, but hopefully in those types of cases we can at
> least reset the device with a hardware watchdog?

Now, why would I want to make the spinlock code worse if I have a
perfectly functional NMI watchdog?

> Overall the issue is that it's really awkward to have both types of
> lockup detectors, especially since you've got to pick at compile time.

Well, then go fix that. Surely this isn't rocket science.

> The perf lockup detector has a pile of things that make it pretty
> awkward and it seems like people have been toward the buddy detector
> because of this...

There's nothing awkward about the perf one, except that it takes one
counter, and some people are just greedy and want all of them. At the
same time, there are people posting patches that use the PMU for
page-promotion like things, so these same greedy people are going to
hate on that too.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ