[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1500407379.2042.21.camel@hpe.com>
Date: Tue, 18 Jul 2017 19:58:54 +0000
From: "Kani, Toshimitsu" <toshi.kani@....com>
To: "tony.luck@...el.com" <tony.luck@...el.com>,
"bp@...en8.de" <bp@...en8.de>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mchehab@...nel.org" <mchehab@...nel.org>,
"rjw@...ysocki.net" <rjw@...ysocki.net>,
"srinivas.pandruvada@...ux.intel.com"
<srinivas.pandruvada@...ux.intel.com>,
"lenb@...nel.org" <lenb@...nel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac
On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote:
> On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote:
> > The ghes_edac driver was introduced in 2013 [1], but it has not
> > been enabled by any distro yet. This driver obtains error info
> > from firmware interfaces, which are not properly implemented on
> > many platforms, as the driver always emits the messages below:
> >
> > This EDAC driver relies on BIOS to enumerate memory and get error
> > reports. Unfortunately, not all BIOSes reflect the memory layout
> > correctly So, the end result of using this driver varies from
> > vendor to vendor If you find incorrect reports, please contact
> > your hardware vendor to correct its BIOS.
> >
> > To get out from this situation, add a platform type check to
> > selectively enable the driver on the platforms that are known to
> > have proper firmware implementation. Platform vendors can add
> > their platforms to the list when they support ghes_edac.
>
> So maintaining whitelists for things has always been a PITA and we
> should try to avoid it, if possible. (We can always do it if nothing
> saner comes along.)
Agreed.
> Now, below is a dirty patch converting ghes_edac to a normal module.
> On systems where we have GHES, the firmware generally disables the
> detection of the presence of ECC hardware, thus preventing the
> platform EDAC driver from loading.
I have HPE Haswell and Skylake test systems with GHES, but they do not
hide IMCs from the OS. So, the sb_edac and skx_edac drivers get
attached on these systems when ghes_edac is disabled.
> Let me clarify: I have an AMD HP box which, when GHES is enabled in
> the BIOS, says that ECC is disabled in the memory controller and the
> amd64_edac driver doesn't load for that memory controller.
Hmm... what's the platform name of this box? I can look into this case
if you need.
> And I think we should try this first: have the firmware disable
> detection methods so that the platform drivers don't load.
I do not think we can rely on this method.
> Then, ghes_edac can be a simple module and no other driver would
> attempt loading.
I like the use of notifier chain, which is much cleaner.
> The question is: does the platform do this disabling now?
Unfortunately, that is not the case today. The IMCs cannot be hidden
with the Device Hide registers for Skylake at least.
Thanks,
-Toshi
Powered by blists - more mailing lists