lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Jul 2017 15:14:32 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>,
        Mauro Carvalho Chehab <mchehab@...pensource.com>
CC:     "Kani, Toshimitsu" <toshi.kani@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mchehab@...nel.org" <mchehab@...nel.org>,
        "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "srinivas.pandruvada@...ux.intel.com" 
        <srinivas.pandruvada@...ux.intel.com>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac

> "The module number of the memory error location. (NODE, CARD, and MODULE
> should provide the information necessary to identify the failing FRU)."
>
> So this tuple is sufficient to pinpoint the DIMM, IIUC.
>
> Which means, ghes_edac can have a single layer of DIMMs without channels.

The tricky part is that you have to rely on SMBIOS/DMI to know what DIMMs are
on the system when the driver initializes so you can populate /sys/.*/edac

Later when GHES gives you a NODE/CARD/MODULE) in an error record.  You need
to match these up. But SMBIOS only gave you two strings "Locator" and "Bank
Locator" which have no defined syntax. You are at the mercy of the BIOS writer
to put in something parseable. Some writers used zero based counts, others are
Fortran fans and use one-based. Still other use letters.  About the one guarantee
is that they will make almost no effort to match the silkscreen labels on the motherboard
itself.

E.g. my Broadwell-EX has things like:

        Locator: CHANNEL D DIMM 1
        Bank Locator: Memriser8

Channel is A,B,C,D. DIMM is 0, 1, 2. Memriser is {1..8} so this manages to use all
three counting options!

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ