[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220517184154.GA6711@ranerica-svr.sc.intel.com>
Date: Tue, 17 May 2022 11:41:54 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: x86@...nel.org, Tony Luck <tony.luck@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
Stephane Eranian <eranian@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <joro@...tes.org>,
Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
Nicholas Piggin <npiggin@...il.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Ricardo Neri <ricardo.neri@...el.com>,
iommu@...ts.linux-foundation.org, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler
category
On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> > Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> > is to be used with the HPET-based hardlockup detector. This detector
> > does not have a direct way of checking if the HPET timer is the source of
> > the NMI. Instead, it indirectly estimates it using the time-stamp counter.
> >
> > Therefore, we may have false-positives in case another NMI occurs within
> > the estimated time window. For this reason, we want the handler of the
> > detector to be called after all the NMI_LOCAL handlers. A simple way
> > of achieving this with a new NMI handler category.
> >
> > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
> > }
> > raw_spin_unlock(&nmi_reason_lock);
> >
> > + handled = nmi_handle(NMI_WATCHDOG, regs);
> > + if (handled == NMI_HANDLED)
> > + goto out;
> > +
>
> How is this supposed to work reliably?
>
> If perf is active and the HPET NMI and the perf NMI come in around the
> same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
> won't be checked. Because MSI is strictly edge and the message is only
> sent once, this can result in a stale watchdog, no?
This is true. Instead, at the end of each NMI I should _also_ check if the TSC
is within the expected value of the HPET NMI watchdog. In this way, unrelated
NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET
channel.
Thanks and BR,
Ricardo
Powered by blists - more mailing lists