[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220517225308.GC6711@ranerica-svr.sc.intel.com>
Date: Tue, 17 May 2022 15:53:08 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Nicholas Piggin <npiggin@...il.com>, x86@...nel.org,
Andi Kleen <ak@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>,
Stephane Eranian <eranian@...gle.com>,
iommu@...ts.linux-foundation.org, Joerg Roedel <joro@...tes.org>,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Ricardo Neri <ricardo.neri@...el.com>,
Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining
tsc_khz
On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> >> + /*
> >> + * If in use, the HPET hardlockup detector relies on tsc_khz.
> >> + * Reconfigure it to make use of the refined tsc_khz.
> >> + */
> >> + lockup_detector_reconfigure();
> >
> > I don't know if the API is conceptually good.
> >
> > You change something that the lockup detector is currently using,
> > *while* the detector is running asynchronously, and then reconfigure
> > it. What happens in the window? If this code is only used for small
> > adjustments maybe it does not really matter but in principle it's
> > a bad API to export.
> >
> > lockup_detector_reconfigure as an internal API is okay because it
> > reconfigures things while the watchdog is stopped [actually that
> > looks untrue for soft dog which uses watchdog_thresh in
> > is_softlockup(), but that should be fixed].
> >
> > You're the arch so you're allowed to stop the watchdog and configure
> > it, e.g., hardlockup_detector_perf_stop() is called in arch/.
> >
> > So you want to disable HPET watchdog if it was enabled, then update
> > wherever you're using tsc_khz, then re-enable.
>
> The real question is whether making this refined tsc_khz value
> immediately effective matters at all. IMO, it does not because up to
> that point the watchdog was happily using the coarse calibrated value
> and the whole use TSC to assess whether the HPET fired mechanism is just
> a guestimate anyway. So what's the point of trying to guess 'more
> correct'.
In some of my test systems I observed that, the TSC value does not fall
within the expected error window the first time the HPET channel expires.
I inferred that the error computed using the coarser tsc_khz was wrong.
Recalculating the error window with refined tsc_khz would correct it.
However, restarting the timer has the side-effect of kicking the timer and,
therefore pushing the first HPET NMI further in the future.
Perhaps kicking HPET channel, not recomputing the error window, corrected
(masked?) the problem.
I will investigate further and rework or drop this patch as needed.
Thanks and BR,
Ricardo
Powered by blists - more mailing lists