[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220517220810.GB6711@ranerica-svr.sc.intel.com>
Date: Tue, 17 May 2022 15:08:10 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: Nicholas Piggin <npiggin@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
Andi Kleen <ak@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>,
Stephane Eranian <eranian@...gle.com>,
iommu@...ts.linux-foundation.org, Joerg Roedel <joro@...tes.org>,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Ricardo Neri <ricardo.neri@...el.com>,
Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining
tsc_khz
On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET hardlockup detector relies on tsc_khz to estimate the value of
> > that the TSC will have when its HPET channel fires. A refined tsc_khz
> > helps to estimate better the expected TSC value.
> >
> > Using the early value of tsc_khz may lead to a large error in the expected
> > TSC value. Restarting the NMI watchdog detector has the effect of kicking
> > its HPET channel and make use of the refined tsc_khz.
> >
> > When the HPET hardlockup is not in use, restarting the NMI watchdog is
> > a noop.
> >
> > Cc: Andi Kleen <ak@...ux.intel.com>
> > Cc: Stephane Eranian <eranian@...gle.com>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@...el.com>
> > Cc: iommu@...ts.linux-foundation.org
> > Cc: linuxppc-dev@...ts.ozlabs.org
> > Cc: x86@...nel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
> > ---
> > Changes since v5:
> > * Introduced this patch
> >
> > Changes since v4
> > * N/A
> >
> > Changes since v3
> > * N/A
> >
> > Changes since v2:
> > * N/A
> >
> > Changes since v1:
> > * N/A
> > ---
> > arch/x86/kernel/tsc.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..cc1843044d88 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> > /* Inform the TSC deadline clockevent devices about the recalibration */
> > lapic_update_tsc_freq();
> >
> > + /*
> > + * If in use, the HPET hardlockup detector relies on tsc_khz.
> > + * Reconfigure it to make use of the refined tsc_khz.
> > + */
> > + lockup_detector_reconfigure();
>
> I don't know if the API is conceptually good.
>
> You change something that the lockup detector is currently using,
> *while* the detector is running asynchronously, and then reconfigure
> it.
Yes, this is what happens.
> What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter
Please see my comment
> but in principle it's a bad API to export.
>
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped
I see.
> [actually that looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].
Perhaps there should be a watchdog_thresh_user. When the user updates it,
the detector is stopped, watchdog_thresh is updated, and then the detector
is resumed.
>
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.
I had it like this but it did not look right to me. You are right, however,
I can stop/restart the watchdog when needed.
Thanks and BR,
Ricardo
Powered by blists - more mailing lists