[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEEQ3wkUai4-noA6CQDsOs=Hn3bSB5cdV9tZJZR8_g+o9e-Mpg@mail.gmail.com>
Date: Tue, 30 Sep 2025 15:11:21 +0800
From: yunhui cui <cuiyunhui@...edance.com>
To: Doug Anderson <dianders@...omium.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
ziqianlu@...edance.com, akpm@...ux-foundation.org, alex@...ti.fr,
anup@...infault.org, aou@...s.berkeley.edu, atish.patra@...ux.dev,
catalin.marinas@....com, johannes@...solutions.net, lihuafei1@...wei.com,
mark.rutland@....com, masahiroy@...nel.org, maz@...nel.org, mingo@...nel.org,
nicolas.schier@...ux.dev, palmer@...belt.com, paul.walmsley@...ive.com,
suzuki.poulose@....com, thorsten.blum@...ux.dev, wangjinchao600@...il.com,
will@...nel.org, yangyicong@...ilicon.com, zhanjie9@...ilicon.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, linux-riscv@...ts.infradead.org
Subject: Re: [External] Re: [PATCH v2 1/2] watchdog: move arm64 watchdog_hld
into common code
Hi Doug,
On Tue, Sep 30, 2025 at 9:53 AM yunhui cui <cuiyunhui@...edance.com> wrote:
>
> Hi Doug,
>
> On Tue, Sep 30, 2025 at 6:56 AM Doug Anderson <dianders@...omium.org> wrote:
> >
> > Hi,
> >
> > On Sat, Sep 27, 2025 at 7:38 PM yunhui cui <cuiyunhui@...edance.com> wrote:
> > >
> > > Hi Doug,
> > >
> > > On Sat, Sep 27, 2025 at 4:58 AM Doug Anderson <dianders@...omium.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Sep 26, 2025 at 2:37 AM yunhui cui <cuiyunhui@...edance.com> wrote:
> > > > >
> > > > > Hi Doug,
> > > > >
> > > > > On Fri, Sep 26, 2025 at 4:00 AM Doug Anderson <dianders@...omium.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Thu, Sep 25, 2025 at 1:48 AM Yunhui Cui <cuiyunhui@...edance.com> wrote:
> > > > > > >
> > > > > > > @@ -17,6 +17,7 @@
> > > > > > > #include <linux/cpu_pm.h>
> > > > > > > #include <linux/export.h>
> > > > > > > #include <linux/kernel.h>
> > > > > > > +#include <linux/nmi.h>
> > > > > > > #include <linux/perf/arm_pmu.h>
> > > > > > > #include <linux/slab.h>
> > > > > > > #include <linux/sched/clock.h>
> > > > > > > @@ -696,10 +697,12 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu)
> > > > > > > return per_cpu(hw_events->irq, cpu);
> > > > > > > }
> > > > > > >
> > > > > > > -bool arm_pmu_irq_is_nmi(void)
> > > > > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> > > > > > > +bool arch_perf_nmi_is_available(void)
> > > > > > > {
> > > > > > > return has_nmi;
> > > > > > > }
> > > > > > > +#endif
> > > > > >
> > > > > > Should the previous comment move here, AKA:
> > > > > >
> > > > > > /*
> > > > > > * hardlockup_detector_perf_init() will success even if Pseudo-NMI turns off,
> > > > >
> > > > > Okay, we also need to change it to “watchdog_hardlockup_probe()”
> > > > >
> > > > > > * however, the pmu interrupts will act like a normal interrupt instead of
> > > > > > * NMI and the hardlockup detector would be broken.
> > > > > > */
> > > > > >
> > > > > >
> > > > > > > +static int __init init_watchdog_freq_notifier(void)
> > > > > > > +{
> > > > > > > + return cpufreq_register_notifier(&watchdog_freq_notifier,
> > > > > > > + CPUFREQ_POLICY_NOTIFIER);
> > > > > >
> > > > > > I think you need to do something to prevent this from happening on any
> > > > > > platforms that override hw_nmi_get_sample_period(), right? These
> > > > > > cpufreq notifiers will be useless in that case...
> > > > >
> > > > > I understand this is not a problem. watchdog_perf uses
> > > > > PERF_COUNT_HW_CPU_CYCLES, which means it is inherently limited by the
> > > > > CPU's main frequency. After we make such a change, a larger value may
> > > > > be used as the period, so the NMI period will become longer, but this
> > > > > value will not change after the system starts.
> > > >
> > > > I'm not sure I follow. On x86, hw_nmi_get_sample_period() is:
> > > >
> > > > u64 hw_nmi_get_sample_period(int watchdog_thresh)
> > > > {
> > > > return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> > > > }
> > >
> > > I have added the maintainers for arch/x86.
> > >
> > > 1. cpu_khz can be understood as returning the base operating frequency
> > > of a CPU, such as 2.3GHz. In practice, the CPU's core frequency may
> > > downclock to 800MHz under low load and overclock to 4.4GHz under high
> > > load.
> > >
> > > 2. Because the event provided to the PMU has the
> > > PERF_COUNT_HW_CPU_CYCLES attribute, and the counter's value is based
> > > on 2.3GHz, the execution cycle of watchdog_overflow_callback() is not
> > > fixed; it varies with the CPU's core frequency. When the CPU runs at a
> > > frequency higher than 2.3GHz, the NMI cycle will shorten; otherwise,
> > > it will lengthen.
> > >
> > > 3. After our modification, if the architecture is not integrated with
> > > cpufreq, it returns 0 and will not update the cycle. If integrated
> > > with cpufreq, it returns the maximum frequency supported by the CPU,
> > > so the NMI cycle is only slightly lengthened, with no impact on the
> > > actual hardlockup detection function.
> > >
> > > 4. I have also conducted tests:
> > > stress-ng --cpu 1 --taskset 1 --cpu-load 80
> > > echo 800000 > scaling_max_freq
> > > turbostat shows that Bzy_MHz and TSC_MHz are 800 and 2300 respectively.
> > > And the NMI cycle became approximately 30 seconds:
> > > [ 2309.442743] NMI watchdog: ------ watchdog overflow callback, cpu = 1
> > > [ 2341.526032] NMI watchdog: ------ watchdog overflow callback, cpu = 1
> >
> > Whether or not having x86 and powerpc start looking at cpufreq is an
> > improvement, certainly it is a change in behavior, right? If we're
> > really changing the behavior here then the commit subject and commit
> > message need to mention this. Right now this is billed as a simple
> > rename...
> >
> > I don't personally have lots of experience with x86 cpufreq but I do
> > know it doesn't work quite the same as how it does on arm. It would
> > definitely be good to get someone on x86 / powerpc to make sure that
> > they are happy with this. ...or you just keep making it work the way
> > it did before and then you don't have to worry about getting any
> > buy-in from x86 / powerpc folks. Up to you, I guess.
>
> Well, I fully agree with your suggestion. What I can think of for now
> is wrapping the newly added content with #if defined(CONFIG_ARM64) ||
> defined(CONFIG_RISCV).
I’ve also thought of an alternative approach: add a generic
WATCHDOG_PERF_ADJUST_PERIOD to /lib/Kconfig.debug, then have the
Kconfig files for arch/arm64 and riscv select
WATCHDOG_PERF_ADJUST_PERIOD. What’s your take on this?
Do you have a more elegant approach to
> recommend?
>
> >
> > Note that folks on x86 have definitely had to deal with turbo mode
> > before. See commit 7edaeb6841df ("kernel/watchdog: Prevent false
> > positives with turbo modes").
> >
> > -Doug
> >
>
> Thanks,
> Yunhui
Thanks,
Yunhui
Powered by blists - more mailing lists