[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y1gQQ8gh1CJf0Tuy@yaz-fattaah>
Date: Tue, 25 Oct 2022 16:35:15 +0000
From: Yazen Ghannam <yazen.ghannam@....com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...en8.de>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"Smita.KoralahalliChannabasappa@....com"
<Smita.KoralahalliChannabasappa@....com>
Subject: Re: [PATCH 1/3] x86/MCE, EDAC/mce_amd: Add support for new
MCA_SYND{1,2} registers
On Mon, Oct 24, 2022 at 09:52:45PM +0000, Luck, Tony wrote:
> > I missed the pre-pended length bit ... with that it all makes sense.
>
> Though the other place where mce records are visible to user space
> is in trace records. See:
>
> include/trace/events/mce.h
>
> (N.B. this needs an update to include the ppin and microcode fields).
>
> If these new fields need to be in the trace log, and we want to make
> it easy to extend, then have to figure out how to handle this in a way
> that doesn't confuse applications (rasdaemon ... are there others)
> that consume the trace records.
>
Hi folks,
I think the "right way" to use tracepoints is to parse the data according to
the format provided by the tracepoint itself. You can see an example of
rasdaemon doing that here:
https://github.com/mchehab/rasdaemon/blob/c2255178a49f62c53009a456bc37dd5e37332f09/ras-mce-handler.c#L386
A tracepoint user should not assume a fixed struct layout, so adding and
rearranging fields shouldn't be a problem. I'm not sure about removing a
field. It seems to me that this should be okay in the interface, and it's up
to the application how it wants to handle a field that isn't found.
Also, rasdaemon does already support raw, variable-length data:
https://github.com/mchehab/rasdaemon/blob/master/ras-non-standard-handler.c
This could be an example used to update the MCE part.
I think the only (or popular?) userspace tool that relies on the layout of
struct mce is mcelog. This is not supported on modern AMD systems, and we
refer users to rasdaemon instead.
Another option could be to define a new tracepoint. Userspace already needs to
be updated to recognize new fields, so I don't think it's much of a stretch to
add a new tracepoint handler. This may be simpler than trying to fit
vendor-specific info into an existing tracepoint and then decoding it later in
userspace.
I do like the suggestion from Boris to have a length and vendor data in struct
mce. This should keep mcelog happy while letting us treat struct mce as a
semi-internal kernel structure. Also, this avoids having to mess around with
all the notifier chain definitions.
I'll start working on an implementation if that's okay with you all. I'll
include kernel and rasdaemon patches together so we can have context for
discussion.
Thanks,
Yazen
Powered by blists - more mailing lists