[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YsxefXQDCiJ1zxLG@yaz-fattaah>
Date: Mon, 11 Jul 2022 17:31:41 +0000
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
tony.luck@...el.com, x86@...nel.org,
Smita.KoralahalliChannabasappa@....com
Subject: Re: [PATCH 1/3] x86/MCE, EDAC/mce_amd: Add support for new
MCA_SYND{1,2} registers
On Thu, Jun 30, 2022 at 01:01:58PM +0200, Borislav Petkov wrote:
> On Mon, Apr 18, 2022 at 05:44:38PM +0000, Yazen Ghannam wrote:
> > Future Scalable MCA systems will include two new registers: MCA_SYND1
> > and MCA_SYND2.
> >
> > These registers will include supplemental error information in addition
> > to the existing MCA_SYND register. The data within the registers is
> > considered valid if MCA_STATUS[SyndV] is set.
> >
> > Add fields for these registers in struct mce. Save and print these
> > registers wherever MCA_STATUS[SyndV]/MCA_SYND is currently used.
>
> That's all fine and good but what kind of supplemental error information
> are we talking about here? Example?
>
> And how is that error info going to be used in luserspace?
>
I think the general case will be more bank-specific information. For example,
if the bank is a cache type then the info one format and if the bank is a CPU
type then it'll be a different format, etc. So I think the new info will be
treated the same as the old info, i.e. collect all the raw data and share it
with a hardware debug person.
The one example where this is different is the "FRU Text" case covered in a
following patch in this set.
> I don't want to increase struct mce record size by 16 bytes and those
> end up unused.
>
> Can the information from MCA_SYND{,1,2} be synthesized into a smaller
> quantity an then fed to userspace?
>
I don't think so, at least not at the moment. There aren't any "architectural"
fields that can be interpreted the same accross multiple errors types and
banks.
Is your concern specifically on growing/changing struct mce, or is it more
about limiting info sent to userspace?
If it's the former, then I've been thinking it would be good to introduce a
new internal "struct mce_ext" that includes struct mce plus other things. This
way struct mce can still be uapi, and things like mcelog can use it. And at
the same time we can new data used in the kernel or shared through
tracepoints.
/* Extended MCE structure */
struct mce_ext {
struct mce *m;
/* new stuff here */
};
What do you think?
Thanks,
Yazen
Powered by blists - more mailing lists