[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=XoUw6HWoFLMcgyHhYm7oeTOXCAyvBdpOghggmdvYzp6Q@mail.gmail.com>
Date: Mon, 29 Sep 2025 15:55:33 -0700
From: Doug Anderson <dianders@...omium.org>
To: yunhui cui <cuiyunhui@...edance.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
ziqianlu@...edance.com, akpm@...ux-foundation.org, alex@...ti.fr,
anup@...infault.org, aou@...s.berkeley.edu, atish.patra@...ux.dev,
catalin.marinas@....com, johannes@...solutions.net, lihuafei1@...wei.com,
mark.rutland@....com, masahiroy@...nel.org, maz@...nel.org, mingo@...nel.org,
nicolas.schier@...ux.dev, palmer@...belt.com, paul.walmsley@...ive.com,
suzuki.poulose@....com, thorsten.blum@...ux.dev, wangjinchao600@...il.com,
will@...nel.org, yangyicong@...ilicon.com, zhanjie9@...ilicon.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, linux-riscv@...ts.infradead.org
Subject: Re: [External] Re: [PATCH v2 1/2] watchdog: move arm64 watchdog_hld
into common code
Hi,
On Sat, Sep 27, 2025 at 7:38 PM yunhui cui <cuiyunhui@...edance.com> wrote:
>
> Hi Doug,
>
> On Sat, Sep 27, 2025 at 4:58 AM Doug Anderson <dianders@...omium.org> wrote:
> >
> > Hi,
> >
> > On Fri, Sep 26, 2025 at 2:37 AM yunhui cui <cuiyunhui@...edance.com> wrote:
> > >
> > > Hi Doug,
> > >
> > > On Fri, Sep 26, 2025 at 4:00 AM Doug Anderson <dianders@...omium.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Sep 25, 2025 at 1:48 AM Yunhui Cui <cuiyunhui@...edance.com> wrote:
> > > > >
> > > > > @@ -17,6 +17,7 @@
> > > > > #include <linux/cpu_pm.h>
> > > > > #include <linux/export.h>
> > > > > #include <linux/kernel.h>
> > > > > +#include <linux/nmi.h>
> > > > > #include <linux/perf/arm_pmu.h>
> > > > > #include <linux/slab.h>
> > > > > #include <linux/sched/clock.h>
> > > > > @@ -696,10 +697,12 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu)
> > > > > return per_cpu(hw_events->irq, cpu);
> > > > > }
> > > > >
> > > > > -bool arm_pmu_irq_is_nmi(void)
> > > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> > > > > +bool arch_perf_nmi_is_available(void)
> > > > > {
> > > > > return has_nmi;
> > > > > }
> > > > > +#endif
> > > >
> > > > Should the previous comment move here, AKA:
> > > >
> > > > /*
> > > > * hardlockup_detector_perf_init() will success even if Pseudo-NMI turns off,
> > >
> > > Okay, we also need to change it to “watchdog_hardlockup_probe()”
> > >
> > > > * however, the pmu interrupts will act like a normal interrupt instead of
> > > > * NMI and the hardlockup detector would be broken.
> > > > */
> > > >
> > > >
> > > > > +static int __init init_watchdog_freq_notifier(void)
> > > > > +{
> > > > > + return cpufreq_register_notifier(&watchdog_freq_notifier,
> > > > > + CPUFREQ_POLICY_NOTIFIER);
> > > >
> > > > I think you need to do something to prevent this from happening on any
> > > > platforms that override hw_nmi_get_sample_period(), right? These
> > > > cpufreq notifiers will be useless in that case...
> > >
> > > I understand this is not a problem. watchdog_perf uses
> > > PERF_COUNT_HW_CPU_CYCLES, which means it is inherently limited by the
> > > CPU's main frequency. After we make such a change, a larger value may
> > > be used as the period, so the NMI period will become longer, but this
> > > value will not change after the system starts.
> >
> > I'm not sure I follow. On x86, hw_nmi_get_sample_period() is:
> >
> > u64 hw_nmi_get_sample_period(int watchdog_thresh)
> > {
> > return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> > }
>
> I have added the maintainers for arch/x86.
>
> 1. cpu_khz can be understood as returning the base operating frequency
> of a CPU, such as 2.3GHz. In practice, the CPU's core frequency may
> downclock to 800MHz under low load and overclock to 4.4GHz under high
> load.
>
> 2. Because the event provided to the PMU has the
> PERF_COUNT_HW_CPU_CYCLES attribute, and the counter's value is based
> on 2.3GHz, the execution cycle of watchdog_overflow_callback() is not
> fixed; it varies with the CPU's core frequency. When the CPU runs at a
> frequency higher than 2.3GHz, the NMI cycle will shorten; otherwise,
> it will lengthen.
>
> 3. After our modification, if the architecture is not integrated with
> cpufreq, it returns 0 and will not update the cycle. If integrated
> with cpufreq, it returns the maximum frequency supported by the CPU,
> so the NMI cycle is only slightly lengthened, with no impact on the
> actual hardlockup detection function.
>
> 4. I have also conducted tests:
> stress-ng --cpu 1 --taskset 1 --cpu-load 80
> echo 800000 > scaling_max_freq
> turbostat shows that Bzy_MHz and TSC_MHz are 800 and 2300 respectively.
> And the NMI cycle became approximately 30 seconds:
> [ 2309.442743] NMI watchdog: ------ watchdog overflow callback, cpu = 1
> [ 2341.526032] NMI watchdog: ------ watchdog overflow callback, cpu = 1
Whether or not having x86 and powerpc start looking at cpufreq is an
improvement, certainly it is a change in behavior, right? If we're
really changing the behavior here then the commit subject and commit
message need to mention this. Right now this is billed as a simple
rename...
I don't personally have lots of experience with x86 cpufreq but I do
know it doesn't work quite the same as how it does on arm. It would
definitely be good to get someone on x86 / powerpc to make sure that
they are happy with this. ...or you just keep making it work the way
it did before and then you don't have to worry about getting any
buy-in from x86 / powerpc folks. Up to you, I guess.
Note that folks on x86 have definitely had to deal with turbo mode
before. See commit 7edaeb6841df ("kernel/watchdog: Prevent false
positives with turbo modes").
-Doug
Powered by blists - more mailing lists