linux-kernel - Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 01 Mar 2012 11:23:22 +0900
From:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Mauro Carvalho Chehab <mchehab@...hat.com>,
	Tony Luck <tony.luck@...el.com>, Ingo Molnar <mingo@...e.hu>,
	EDAC devel <linux-edac@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

(2012/02/29 22:37), Borislav Petkov wrote:
> On Wed, Feb 29, 2012 at 10:05:53AM -0300, Mauro Carvalho Chehab wrote:
>> Em 29-02-2012 09:19, Borislav Petkov escreveu:
>> - on SB, the MCE status register only has the error message. In order to get
>>   the DIMM location, the driver needs to parse the registers that describe
>>   how the DIMM's are organized (this is spread on dozens of PCI devices, and
>>   200+ registers), and how they're interlaced, in order to convert the error 
>>   address reported by the MCA into a DIMM location.
> 
> As I already said, amd64_edac does a similar thing does already so I
> don't see any difference in the solutions there: decode to the DIMM and
> pass the info through 'msg'.

My concern is; on Sandy Bridge, is it safe to gather info about the DIMM
location in/from machine check context in a reasonable time span?
I know that for corrected errors which is handled in normal context it is
safe to refer the vast PCI configuration space...

Or is it really possible to determine the erroneous DIMM location from OS?
It looks like that how to get the location is highly depending on the
hardware, processor's vendor/family/model and firmware configuration etc..
Even if OS tells me "please replace memory seated on slot#3 at node#5" or
so, I'm not sure whether these numbers are consistent over reboot if
there are some hot-plugged node and/or memory.  Order of numbering can
be changed by how firmware enumerate ACPI namespace or so...
Actually in these days we usually use firmware's system event log to
determine which module should be replaced, assuming that firmware knows
hardware better than OSes running on that machine.

Getting back to the "msg" I think it is not necessary if it does not
contain any new data which is not available in the mce_record today.
If you just want to add field about physical memory location, I think
string "msg" is not only way to do so.

Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/