[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW5PR84MB18429A4A41FD70DCD91CFADDAB6C9@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>
Date: Fri, 19 Aug 2022 18:57:11 +0000
From: "Elliott, Robert (Servers)" <elliott@....com>
To: "Kani, Toshi" <toshi.kani@....com>, Justin He <Justin.He@....com>
CC: "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"devel@...ica.org" <devel@...ica.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Shuai Xue <xueshuai@...ux.alibaba.com>,
Jarkko Sakkinen <jarkko@...nel.org>,
"linux-efi@...r.kernel.org" <linux-efi@...r.kernel.org>,
nd <nd@....com>, Ard Biesheuvel <ardb@...nel.org>,
Len Brown <lenb@...nel.org>, James Morse <James.Morse@....com>,
Tony Luck <tony.luck@...el.com>,
Borislav Petkov <bp@...en8.de>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Robert Richter <rric@...nel.org>,
Robert Moore <robert.moore@...el.com>,
Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
Yazen Ghannam <yazen.ghannam@....com>
Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from
loading after ghes_edac is registered
> -----Original Message-----
> From: Kani, Toshi <toshi.kani@....com>
> Sent: Friday, August 19, 2022 12:49 PM
> To: Justin He <Justin.He@....com>
> Cc: linux-acpi@...r.kernel.org; linux-kernel@...r.kernel.org; linux-
> edac@...r.kernel.org; devel@...ica.org; Rafael J . Wysocki
> <rafael@...nel.org>; Shuai Xue <xueshuai@...ux.alibaba.com>; Jarkko
> Sakkinen <jarkko@...nel.org>; linux-efi@...r.kernel.org; nd
> <nd@....com>; Ard Biesheuvel <ardb@...nel.org>; Len Brown
> <lenb@...nel.org>; James Morse <James.Morse@....com>; Tony Luck
> <tony.luck@...el.com>; Borislav Petkov <bp@...en8.de>; Mauro Carvalho
> Chehab <mchehab@...nel.org>; Robert Richter <rric@...nel.org>; Robert
> Moore <robert.moore@...el.com>; Qiuxu Zhuo <qiuxu.zhuo@...el.com>;
> Yazen Ghannam <yazen.ghannam@....com>
> Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac
> from loading after ghes_edac is registered
>
> On Friday, August 19, 2022 4:35 AM, Justin He wrote:
> > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct
> platform_device
> > > > *ghes_dev)
> > > > platform_set_drvdata(ghes_dev, ghes);
> > > >
> > > > ghes->dev = &ghes_dev->dev;
> > > > + set_ghes_devs_registered(false);
> > >
> > > This does not look right to me.
> > >
> > > The condition of using ghes_edac is (ghes-present) && (ghes-
> preferred),
> > > where:
> > > - ghes-present is latched on ghes_probe()
> > > - ghes-preferred is true if the platform-check passes.
> > >
> > > ghes_get_device() introduced in the previous patch works as the
> > > ghes-preferred check.
> > >
> > > We cannot use ghes_edac registration as the ghes-present check in
> this
> > > scheme since it is deferred to module_init().
> >
> > What is the logic for ghes-present check? In this patch, I assumed it
> is equal to
> > "ghes_edac devices have been registered". Seems it is not correct.
>
> Using (ghes_edac-registered) is a wrong check in this scheme
> since ghes_edac registration is deferred. This check is fine in
> the current scheme since ghes_edac gets registered before
> any other chipset-specific edac drivers.
>
> > But should we consider the case as follows:
> > What if sbridge_init () is invoked before ghes_edac_init()? i.e.
> Should we get
> > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on
> HPE)?
>
> No. The point is that ghes_edac driver needs to be selected,
> "regardless of the module ordering", on a system with GHES
> present & preferred.
>
> Note that this new scheme leads to the following condition
> change:
> - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered)
> - New: (ghes-present) && (ghes-preferred)
>
> The option I suggested previously keeps the current condition,
> but this new scheme does not for better modularity.
>
> What this means is that if ghes_edac is not enabled (but ghes
> is enabled) on a system with GHES present & preferred, no edac
> driver gets registered. This change is fine from my (HPE) perspective
> and should be fine for other GHES systems. GHES systems have
> chipset-specific edac driver in FW. OS-based chipset-specific edac
> driver is not necessary and may lead to a conflict of chipset register
> ownership.
Currently, running with this on the kernel command line
ghes.disable
causes the ACPI ghes driver to quit early in acpi_ghes_init():
/*
* This driver isn't really modular, however for the time being,
* continuing to use module_param is the easiest way to remain
* compatible with existing boot arg use cases.
*/
bool ghes_disable;
module_param_named(disable, ghes_disable, bool, 0);
which results in the skx_edac module assuming it should run:
[ 33.628140] calling skx_init+0x0/0xe5a [skx_edac] @ 1444
[ 33.628531] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:36:0a.0 (INTERRUPT)
[ 33.641432] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:36:0c.0 (INTERRUPT)
[ 33.653256] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
[ 33.665055] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
[ 33.676801] initcall skx_init+0x0/0xe5a [skx_edac] returned 0 after 36343 usecs
We might need to differentiate between the system ROM really not
offering GHES vs. the ghes module not running.
Powered by blists - more mailing lists