[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180301115953.GA5040@pd.tnic>
Date: Thu, 1 Mar 2018 12:59:53 +0100
From: Borislav Petkov <bp@...e.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@....com>
Cc: Tony Luck <tony.luck@...el.com>,
"linux-efi@...r.kernel.org" <linux-efi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ard.biesheuvel@...aro.org" <ard.biesheuvel@...aro.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v2 0/8] Decode IA32/X64 CPER
On Wed, Feb 28, 2018 at 08:58:15PM +0000, Ghannam, Yazen wrote:
> 1) We keep this set mostly as-is. This would be our fallback if we don't have
> anything better.
Yes, sounds good. We try to decode it as MCE and if we cannot, we dump
the raw CPER record.
> 2) I add the MCA decoding to this set. I was thinking to do this in a separate
> set but maybe it's better to do it all together.
I'm fine if you do it separately, as long as you do it so that we have
user-friendly decoding in the end.
> Number 2 would mean we do a quick check on the CPER to see if it contains
> MCA info. There's no spec-defined way to do this, but we can make a good
> guess by seeing if we have an "MSR register" context and that context has
> an "MSR address" that is an MCA register.
Yap.
> If we think we have MCA info, then we pull as much out of the CPER as we
> can and put it in a struct mce which we then pass to the notifier chain.
>
> If we don't think we have MCA info, then we fallback to number 1.
Ack.
> At the moment, it seems we'll be using x86 CPER to represent MCA errors
> in BERT since there's no other option in BERT. So I think having number 2
> would catch most, if not all, errors reported with x86 CPER.
Yeah, if you think about it, CPER is a clumsy and totally useless
indirection layer between MCA and the OS. And if the error is of
different type (AER, PCI, whatever), then it wraps around it too with
some dumb table. And that doesn't bring anything - just the need for
more support added to the OS and tools around it. Basically what you're
doing now.
I don't mind the aspect of firmware seeing the errors first and even
attempting to fix them as the firmware knows the platform intimately but
doing everything in firmware just because some misguided souls think
this gives added value but in reality ends up becoming a worse problem,
is simply the wrong wrong thing to do.
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
Powered by blists - more mailing lists