[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240530091025.60de3227@jacob-builder>
Date: Thu, 30 May 2024 09:10:25 -0700
From: Jacob Pan <jacob.jun.pan@...ux.intel.com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: X86 Kernel <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>, Thomas
Gleixner <tglx@...utronix.de>, Dave Hansen <dave.hansen@...el.com>, Ingo
Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
linux-perf-users@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
Andi Kleen <andi.kleen@...el.com>, Xin Li <xin3.li@...el.com>,
jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH 4/6] x86/irq: Process nmi sources in NMI handler
Hi Peter,
On Wed, 29 May 2024 13:47:09 -0700, "H. Peter Anvin" <hpa@...or.com> wrote:
> On 5/29/24 13:33, Jacob Pan wrote:
> > +
> > + /*
> > + * Per NMI source specification, there is no guarantee that a
> > valid
> > + * NMI vector is always delivered, even when the source
> > specified
> > + * one. It is software's responsibility to check all available
> > NMI
> > + * sources when bit 0 is set in the NMI source bitmap. i.e. we
> > have
> > + * to call every handler as if we have no NMI source.
> > + * On the other hand, if we do get non-zero vectors, we know
> > exactly
> > + * what the sources are. So we only call the handlers with the
> > bit set.
> > + */
> > + if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> > + pr_warn_ratelimited("NMI received with unknown
> > source\n");
> > + return 0;
> > + }
> > +
>
> Note: if bit 0 is set, you can process any other bits first (on the
> general assumption that if you bother with NMI source then those events
> are performance sensitive), and you could even exclude them from the
> poll. This is an optimization, and what you have here is correct from a
> functional point of view.
>
Yes, it is a good optimization but also a rare case that bit 0 is set. no?
> > + source_bitmask = fred_event_data(regs);
> > + if (!source_bitmask) {
> > + pr_warn_ratelimited("NMI received without source
> > information!\n");
> > + return 0;
> > + }
>
> If the event data word is 0, it probably should be treated as a
> *permanent* failure, as it is a Should Not Happen[TM] situation, and
> means there is an implementation (or, perhaps more likely,
> virtualization!) bug, and as such it may not be safe to trust the NMI
> source information in the future.
>
Good point, I will add a flag to permanently disable NMI source reporting
for this boot cycle if that happens.
> > + if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) || type !=
> > NMI_LOCAL)
> > + return 0;
>
> I'm not sure I understand why you are requiring type to be NMI_LOCAL here?
>
It is just for this current implementation I am not including external
NMIs. AFAIK, there is no users, i.e. no device MSIs delivered as NMI. I saw
effort trying to make HPET NMI watchdog but not materialized.
Thanks,
Jacob
Powered by blists - more mailing lists