linux-kernel - Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220517225308.GC6711@ranerica-svr.sc.intel.com>
Date:   Tue, 17 May 2022 15:53:08 -0700
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Nicholas Piggin <npiggin@...il.com>, x86@...nel.org,
        Andi Kleen <ak@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Stephane Eranian <eranian@...gle.com>,
        iommu@...ts.linux-foundation.org, Joerg Roedel <joro@...tes.org>,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ricardo Neri <ricardo.neri@...el.com>,
        Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
        Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining
 tsc_khz

On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> >> +	/*
> >> +	 * If in use, the HPET hardlockup detector relies on tsc_khz.
> >> +	 * Reconfigure it to make use of the refined tsc_khz.
> >> +	 */
> >> +	lockup_detector_reconfigure();
> >
> > I don't know if the API is conceptually good.
> >
> > You change something that the lockup detector is currently using, 
> > *while* the detector is running asynchronously, and then reconfigure
> > it. What happens in the window? If this code is only used for small
> > adjustments maybe it does not really matter but in principle it's
> > a bad API to export.
> >
> > lockup_detector_reconfigure as an internal API is okay because it
> > reconfigures things while the watchdog is stopped [actually that
> > looks untrue for soft dog which uses watchdog_thresh in
> > is_softlockup(), but that should be fixed].
> >
> > You're the arch so you're allowed to stop the watchdog and configure
> > it, e.g., hardlockup_detector_perf_stop() is called in arch/.
> >
> > So you want to disable HPET watchdog if it was enabled, then update
> > wherever you're using tsc_khz, then re-enable.
> 
> The real question is whether making this refined tsc_khz value
> immediately effective matters at all. IMO, it does not because up to
> that point the watchdog was happily using the coarse calibrated value
> and the whole use TSC to assess whether the HPET fired mechanism is just
> a guestimate anyway. So what's the point of trying to guess 'more
> correct'.

In some of my test systems I observed that, the TSC value does not fall
within the expected error window the first time the HPET channel expires.

I inferred that the error computed using the coarser tsc_khz was wrong.
Recalculating the error window with refined tsc_khz would correct it.

However, restarting the timer has the side-effect of kicking the timer and,
therefore pushing the first HPET NMI further in the future.

Perhaps kicking HPET channel, not recomputing the error window, corrected
(masked?) the problem.

I will investigate further and rework or drop this patch as needed.

Thanks and BR,
Ricardo