[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170726151755.571e5979@vento.lan>
Date:   Wed, 26 Jul 2017 15:17:55 -0300
From:   Mauro Carvalho Chehab <mchehab@...radead.org>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        linux-edac <linux-edac@...r.kernel.org>,
        Toshimitsu Kani <toshi.kani@....com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
Em Wed, 26 Jul 2017 17:27:12 +0000
"Luck, Tony" <tony.luck@...el.com> escreveu:
> > > > Hmm... I'm not seeing any implementation that would allow setting
> > > > between firmware first, hardware first or "auto", as we've discussed.
> > > 
> > > This is all coming up. As the 0/3 message said, these 3 patches are the
> > > bare minimum of reorganizing stuff only and should serve as a base.
> >
> > I'll then wait for such patch before acking this series.
> 
> I didn't think that a BIOS that set "firmware first" gave the OS any choice about this.
> 
> What exactly is this option going to do?  Fiddle with ACPI OSC??
Currently, my HP server that I use to build the Kernel is FF:
[    3.783803] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
I didn't try to disable FF on its BIOS. Not sure if it is even possible.
Still, EDAC is working there using sb_edac. As I pointed before, one of the
MC channels is not being detected, but I don't use it on this machine.
Except for that, EDAC seems to be working fine there:
$ ras-mc-ctl --layout
       +-----------------------------------------------------------------------+
       |                mc0                |                mc1                |
       | channel0  | channel1  | channel2  | channel0  | channel1  | channel2  |
-------+-----------------------------------------------------------------------+
slot2: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
slot1: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
slot0: |  16384 MB  |     0 MB  |  16384 MB  |  16384 MB  |     0 MB  |  16384 MB  |
-------+---------------------------------------------------------------------------+
# ras-mc-ctl --guess-labels
memory stick 'PROC 1 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 12' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 12' is located at 'Not Specified'
I didn't try to inject an error, as I'm not sure if EINJ feature is
enabled on this BIOS. Probably not.
At least on this machine, I very much prefer to use sb_edac driver.
As I explained earlier in the previous thread, I just don't if the
BIOS would be doing the right thing for CE, as I don't know its
internal algorithm. 
Also, as I'm maintaining the EDAC userspace tools (rasdaemon),
I would really love to get a few CE error reports there from time to
time, as it could be used to check if rasdaemon is doing do the right
thing to them.
So, I very much prefer to not have any threshold at all there at BIOS.
Thanks,
Mauro
Powered by blists - more mailing lists
 
