[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVkw6TLjVuR3UCNs+X1cwVmYk7UFABio4oDOwfshqoP_g@mail.gmail.com>
Date: Tue, 7 Oct 2025 13:43:02 -0700
From: Ian Rogers <irogers@...gle.com>
To: Doug Anderson <dianders@...omium.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jinchao Wang <wangjinchao600@...il.com>,
Namhyung Kim <namhyung@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>, Yunhui Cui <cuiyunhui@...edance.com>, akpm@...ux-foundation.org,
catalin.marinas@....com, maddy@...ux.ibm.com, mpe@...erman.id.au,
npiggin@...il.com, christophe.leroy@...roup.eu, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
acme@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, adrian.hunter@...el.com, kan.liang@...ux.intel.com,
kees@...nel.org, masahiroy@...nel.org, aliceryhl@...gle.com, ojeda@...nel.org,
thomas.weissschuh@...utronix.de, xur@...gle.com, ruanjinjie@...wei.com,
gshan@...hat.com, maz@...nel.org, suzuki.poulose@....com,
zhanjie9@...ilicon.com, yangyicong@...ilicon.com, gautam@...ux.ibm.com,
arnd@...db.de, zhao.xichao@...o.com, rppt@...nel.org, lihuafei1@...wei.com,
coxu@...hat.com, jpoimboe@...nel.org, yaozhenguo1@...il.com,
luogengkun@...weicloud.com, max.kellermann@...os.com, tj@...nel.org,
yury.norov@...il.com, thorsten.blum@...ux.dev, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, linux-perf-users@...r.kernel.org
Subject: Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
On Tue, Oct 7, 2025 at 1:00 PM Doug Anderson <dianders@...omium.org> wrote:
> On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@...gle.com> wrote:
..
> > I don't mind a file that also says what kind of lockup detector is in
> > use. I'd like the meaning of nmi_watchdog to keep meaning the kernel
> > stole a perf counter as this is the behavior long assumed by perf
> > which I think is the majority or only user of the file. I think the
> > buddy watchdog being controlled by this file is less than intention
> > revealing.
>
> I'm more than happy to be outvoted, but IMO nothing about the name
> "nmi_watchdog" says to me that a perf counter was stolen. It just says
> that there's a watchdog that NMIs are part of its work.
>
> ...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far
> as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks
> `watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did
> commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog"
> constants/function") it was checking `watchdog_enabled &
> NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think...
>
> ...and you can see "PPC_WATCHDOG" depends on
> "HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not
> perf-backed.
>
> I think this is just the critical question: does/should "nmi_watchdog"
> mean perf-backed detector or is "nmi_watchdog" just a synonym for
> "hard lockup detector". In your mind it's the former and in my mind
> it's the latter. The correct way forward depends on which way we want
> to jump, right? In either case some external code is going to need to
> change...
So we could say it is one-idea against another, I'm hoping to be
objective and come from the viewpoint of the perf code as part of its
function is to be a demonstration of APIs and the buddy watchdog has
altered this. The perf tool has referred to the nmi_watchdog's
behavior in man pages and code since 2016:
http://lkml.kernel.org/r/1459810686-15913-1-git-send-email-andi@firstfloor.org
```
+Globally pinned events can limit the number of counters available for
+other groups. On x86 systems, the NMI watchdog pins a counter by default.
+The nmi watchdog can be disabled as root with
+
+ echo 0 > /proc/sys/kernel/nmi_watchdog
```
http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org
```
+void arch_topdown_group_warn(void)
+{
+ fprintf(stderr,
+ "nmi_watchdog enabled with topdown. May give wrong results.\n"
+ "Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
+}
```
Probably the most common error message dates back to 2017:
http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnic
```
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
```
and that is saying "NMI watchdog" not "buddy watchdog". Users are
familiar with the idea that the /proc/sys/kernel/nmi_watchdog is
unremarkably controlling the NMI watchdog.
I've not found another use of /proc/sys/kernel/nmi_watchdog outside of
the perf tool.
> The kernel docs are certainly pretty ambiguous here. "kernel.rst"
> certainly mentions perf, but it also says that "nmi_watchdog" is only
> for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't
> say anything about perf and just seems to indicate that "nmi_watchdog"
> is another name for the hardlockup detector. "kernel-parameters.txt"
> makes some mention of tweaking which perf event would be used, but
> otherwise makes it sound like the nmi_watchdog is just another name
> for the hardlockup detector.
>
> My vote would be two files:
>
> * "nmi_watchdog" - simply a synonym for "hardlockup detector". 1
> enabled, 0 if not (like today)
>
> * "hardlock_detector" - could be "buddy", "perf", or "arch"
>
> ...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog"
> is enabled, you could probably guess that a perf event was in use. If
> the "hardlockup_detector" file exists then a perf event is only in use
> if the value of that file is "perf". It doesn't feel terribly hard to
> do this and it doesn't break anyone who was assuming they could turn
> on the hardlockup detector with "/proc/nmi_watchdog".
It is not hard but:
1) it means whenever perf wants to determine the NMI watchdog is
present it needs to read two files rather than the existing 1, which
has some runtime cost;
2) the name nmi_watchdog for controlling the behavior of the buddy
watchdog isn't intention revealing as the buddy mechanisms whole point
is to avoid the NMI;
3) old perf tools with the buddy watchdog have the wrong behavior
(they've regressed).
It is also not hard to have the watchdog files named
<mechanism>_watchdog, such as buddy_watchdog, nmi_watchdog,
arch_watchdog and have the contents be 0 or 1 as appropriate. Such a
scheme would have less overhead for perf, make the name more intention
revealing and not alter old perf tools. I'm really not sure what
problem we're trying to fix by not adopting this approach. I was
surprised the buddy watchdog merged but using the nmi_watchdog file.
Thanks,
Ian
> -Doug
>
Powered by blists - more mailing lists