[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250815134259.GA27834@yaz-khff2.amd.com>
Date: Fri, 15 Aug 2025 09:42:59 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
x86@...nel.org, avadhut.naik@....com, john.allen@....com
Subject: Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks
On Thu, Aug 14, 2025 at 03:17:21PM -0700, Luck, Tony wrote:
> On Thu, Aug 14, 2025 at 05:07:30PM -0400, Yazen Ghannam wrote:
> > On Thu, Aug 14, 2025 at 12:52:19PM -0700, Luck, Tony wrote:
> > > But the first match nature of the table means that this rule hits
> > > (becauase neither or RIPV or EIPV is set):
> > >
> > > /* Neither return not error IP -- no chance to recover -> PANIC */
> > > MCESEV(
> > > PANIC, "Neither restart nor error IP",
> > > EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> > > ),
> > >
> >
> > Thanks Tony. I see what you mean.
> >
> > Do we really need this rule? It is essentially the same as the following
> > rule:
> >
> > MCESEV(
> > PANIC, "In kernel and no restart IP",
> > EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
> > ),
> >
> > ...since we assume "KERNEL" context if RIPV|EIPV are clear after
> > checking the CS register.
>
> I'm not sure this could ever happen. But if it did, I think I'd like
> to see that message.
> >
> > The message is not as explicit though.
> >
> > I did have an earlier idea that we introduce an "UNKNOWN" context for
> > the !pt_regs case.
> >
> > We could add the "UNKNOWN" context to the "Neither restart nor error IP"
> > rule. That way it'll be skipped if we have a "USER" context and then it
> > should match the one you want.
>
> I don't want to do that anywhere execpt that Sandybridge instruction
> fetch case (which wasn't classified as an erratum, because the h/w
> guys chose to set RIPV==0 and EIPV==0 ... but it was a poor choice.)
>
> > Also, I just saw this in the Intel SDM:
> >
> > "For the P6 family processors, if the EIPV flag in the MCG_STATUS MSR is
> > set, the saved contents of CS and EIP registers are directly associated
> > with the error that caused the machine-check exception to be generated;
> > if the flag is clear, the saved instruction pointer may not be associated
> > with the error (see Section 17.3.1.2, “IA32_MCG_STATUS MSR”)."
> >
> > But I can't tell if this is true just for P6 or all, because the CS
> > register isn't referenced again with EIPV.
>
> Should probably have said "P6 and newer". The intent of EIPV is to
> indicate that this machine check is because of something that happened
> on the current CPU (remember this bit was defined when all #MC on Intel
> were broadcast, so knowing which CPU(s) are involved, and which have
> just been pulled in to the #MC handler by the broadcast was very
> important.
>
Okay, fair enough. It seems like these quirks should stay. Thanks for
the discussion. It really helped me better understand these quirks and
their history.
Thanks,
Yazen
Powered by blists - more mailing lists