[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F6130D126@ORSMSX114.amr.corp.intel.com>
Date: Wed, 19 Jul 2017 15:14:32 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Borislav Petkov <bp@...en8.de>,
Mauro Carvalho Chehab <mchehab@...pensource.com>
CC: "Kani, Toshimitsu" <toshi.kani@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mchehab@...nel.org" <mchehab@...nel.org>,
"rjw@...ysocki.net" <rjw@...ysocki.net>,
"srinivas.pandruvada@...ux.intel.com"
<srinivas.pandruvada@...ux.intel.com>,
"lenb@...nel.org" <lenb@...nel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac
> "The module number of the memory error location. (NODE, CARD, and MODULE
> should provide the information necessary to identify the failing FRU)."
>
> So this tuple is sufficient to pinpoint the DIMM, IIUC.
>
> Which means, ghes_edac can have a single layer of DIMMs without channels.
The tricky part is that you have to rely on SMBIOS/DMI to know what DIMMs are
on the system when the driver initializes so you can populate /sys/.*/edac
Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need
to match these up. But SMBIOS only gave you two strings "Locator" and "Bank
Locator" which have no defined syntax. You are at the mercy of the BIOS writer
to put in something parseable. Some writers used zero based counts, others are
Fortran fans and use one-based. Still other use letters. About the one guarantee
is that they will make almost no effort to match the silkscreen labels on the motherboard
itself.
E.g. my Broadwell-EX has things like:
Locator: CHANNEL D DIMM 1
Bank Locator: Memriser8
Channel is A,B,C,D. DIMM is 0, 1, 2. Memriser is {1..8} so this manages to use all
three counting options!
-Tony
Powered by blists - more mailing lists