linux-kernel - Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220517220810.GB6711@ranerica-svr.sc.intel.com>
Date:   Tue, 17 May 2022 15:08:10 -0700
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
        Andi Kleen <ak@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Stephane Eranian <eranian@...gle.com>,
        iommu@...ts.linux-foundation.org, Joerg Roedel <joro@...tes.org>,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ricardo Neri <ricardo.neri@...el.com>,
        Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
        Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining
 tsc_khz

On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET hardlockup detector relies on tsc_khz to estimate the value of
> > that the TSC will have when its HPET channel fires. A refined tsc_khz
> > helps to estimate better the expected TSC value.
> > 
> > Using the early value of tsc_khz may lead to a large error in the expected
> > TSC value. Restarting the NMI watchdog detector has the effect of kicking
> > its HPET channel and make use of the refined tsc_khz.
> > 
> > When the HPET hardlockup is not in use, restarting the NMI watchdog is
> > a noop.
> > 
> > Cc: Andi Kleen <ak@...ux.intel.com>
> > Cc: Stephane Eranian <eranian@...gle.com>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@...el.com>
> > Cc: iommu@...ts.linux-foundation.org
> > Cc: linuxppc-dev@...ts.ozlabs.org
> > Cc: x86@...nel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
> > ---
> > Changes since v5:
> >  * Introduced this patch
> > 
> > Changes since v4
> >  * N/A
> > 
> > Changes since v3
> >  * N/A
> > 
> > Changes since v2:
> >  * N/A
> > 
> > Changes since v1:
> >  * N/A
> > ---
> >  arch/x86/kernel/tsc.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..cc1843044d88 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> >  	/* Inform the TSC deadline clockevent devices about the recalibration */
> >  	lapic_update_tsc_freq();
> >  
> > +	/*
> > +	 * If in use, the HPET hardlockup detector relies on tsc_khz.
> > +	 * Reconfigure it to make use of the refined tsc_khz.
> > +	 */
> > +	lockup_detector_reconfigure();
> 
> I don't know if the API is conceptually good.
> 
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. 

Yes, this is what happens.

> What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter

Please see my comment

> but in principle it's a bad API to export.
> 
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped

I see.

> [actually that  looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].

Perhaps there should be a watchdog_thresh_user. When the user updates it,
the detector is stopped, watchdog_thresh is updated, and then the detector
is resumed.

> 
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.

I had it like this but it did not look right to me. You are right, however,
I can stop/restart the watchdog when needed.

Thanks and BR,
Ricardo