[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fV05++2Qvcxs=+tqhTdpGK8L9e5HzVu=y+xHxy9AqLMmg@mail.gmail.com>
Date: Mon, 6 Oct 2025 14:29:59 -0700
From: Ian Rogers <irogers@...gle.com>
To: Jinchao Wang <wangjinchao600@...il.com>
Cc: Namhyung Kim <namhyung@...nel.org>, Doug Anderson <dianders@...omium.org>,
Peter Zijlstra <peterz@...radead.org>, Will Deacon <will@...nel.org>,
Yunhui Cui <cuiyunhui@...edance.com>, akpm@...ux-foundation.org,
catalin.marinas@....com, maddy@...ux.ibm.com, mpe@...erman.id.au,
npiggin@...il.com, christophe.leroy@...roup.eu, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
acme@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, adrian.hunter@...el.com, kan.liang@...ux.intel.com,
kees@...nel.org, masahiroy@...nel.org, aliceryhl@...gle.com, ojeda@...nel.org,
thomas.weissschuh@...utronix.de, xur@...gle.com, ruanjinjie@...wei.com,
gshan@...hat.com, maz@...nel.org, suzuki.poulose@....com,
zhanjie9@...ilicon.com, yangyicong@...ilicon.com, gautam@...ux.ibm.com,
arnd@...db.de, zhao.xichao@...o.com, rppt@...nel.org, lihuafei1@...wei.com,
coxu@...hat.com, jpoimboe@...nel.org, yaozhenguo1@...il.com,
luogengkun@...weicloud.com, max.kellermann@...os.com, tj@...nel.org,
yury.norov@...il.com, thorsten.blum@...ux.dev, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, linux-perf-users@...r.kernel.org
Subject: Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@...il.com> wrote:
>
> On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@...il.com> wrote:
> > > >
> > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@...il.com> wrote:
> > > > > >
> > > > > > Currently, the hard lockup detector is selected at compile time via
> > > > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > > > This is inflexible, especially on systems where a perf event may not
> > > > > > be available or may be needed for other tasks.
> > > > > >
> > > > > > This commit refactors the hard lockup detector to replace a rigid
> > > > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > > > The patch supports building the kernel with either detector
> > > > > > independently, or with both. When both are built, a new boot parameter
> > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > > > This is a more robust and user-friendly design.
> > > > > >
> > > > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > > > regarding the preference and future of the hard lockup detectors. It
> > > > > > implements a flexible solution that addresses the community's need to
> > > > > > select an appropriate detector at boot time.
> > > > > >
> > > > > > The core changes are:
> > > > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > > > > to serve as a single API for the entire feature.
> > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > > > > user to select the desired detector at boot time.
> > > > > > - The Kconfig options are simplified by removing the complex
> > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > > > > built without mutual exclusion.
> > > > > > - The weak stubs are updated to call the new function pointers,
> > > > > > centralizing the watchdog logic.
> > > > >
> > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that
> > > > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > > > why this has to be a boot time option given the ability to configure
> > > > > via /proc/sys/kernel/nmi_watchdog.
> > > > The new hardlockup_detector boot parameter and the existing
> > > > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> > > >
> > > > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > > > This choice is made once at boot.
> > > >
> > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > > > switch for the currently selected detector. It does not change the detector's
> > > > type.
> > >
> > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> > > obvious naming reasons but also because we can't differentiate when a
> > > perf event has been taken or not - this impacts perf that is choosing
> > > not to group events in metrics because of it, reducing the metric's
> > > accuracy. We need an equivalent "buddy_watchdog" file to the
> > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> > > disabled and the perf event one to be enabled. Similarly, if I did
> > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> > > perf event watchdog to be disabled and the buddy one enabled. If I did
> > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> > > enabled. I don't see why choosing the type of watchdog implementation
> > > at boot time is particularly desirable. It seems sensible to default
> > > normal people to using the buddy watchdog (more perf events, power...)
> > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> > > the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> > > perhaps a compatibility option (where the "nmi_watchdog" file controls
> > > the buddy watchdog) is needed so that user code has time to migrate.
> >
> > Sounds good to me. For perf tools, it'd be great if we can have a run-
> > time check which watchdog is selected.
> Considering backward compatibility, I prefer to keep
> /proc/sys/kernel/nmi_watchdog and introduce a new file called
> /proc/sys/kernel/hardlockup_detector_type, which only shows the default string
> or the boot parameter.
Is there code using the buddy watchdog that cares about the
/proc/sys/kernel/nmi_watchdog file? My assumption is that everything
except Android is using the NMI watchdog, so a new
/proc/sys/kernel/buddy_watchdog file doesn't impact them. On Android
writing to /proc/sys/kernel/nmi_watchdog would switch from updating
the buddy watchdog enable/disable to the NMI watchdog enable/disable,
but it is a straightforward patch to make anything doing this update
the buddy_watchdog file instead.
If we have to keep "nmi_watchdog" then we should deprecate it and
create an equivalent file with a better name (ie without NMI in it).
It'll be moderately annoying in perf to determine whether the NMI
watchdog is enabled by having to read two files.
Thanks,
Ian
> The global str pointer hardlockup_detector_type was already introduced in the
> patch, so exposing it in a file is straightforward.
> >
> > Thanks,
> > Namhyung
> >
>
> --
> Jinchao
Powered by blists - more mailing lists