linux-kernel - Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F504645.5040708@jp.fujitsu.com>
Date:	Fri, 02 Mar 2012 13:02:13 +0900
From:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To:	"Luck, Tony" <tony.luck@...el.com>
CC:	Borislav Petkov <bp@...64.org>,
	Mauro Carvalho Chehab <mchehab@...hat.com>,
	Ingo Molnar <mingo@...e.hu>,
	EDAC devel <linux-edac@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

(2012/03/02 3:28), Luck, Tony wrote:
>>> My concern is; on Sandy Bridge, is it safe to gather info about the DIMM
>>> location in/from machine check context in a reasonable time span?
>>
>> Well, what amd64_edac does is "buffer" the required lookup info so
>> whenever you get an error, you simply lookup the channel and chip select
>> - all ops which can be done in atomic context.
> 
> Yes - we could pre-read all the config space registers ahead of time and
> save them in memory (none of the values should change - except if the platform
> supports hot-plug for memory). Total is only a few Kbytes. Then decode in
> machine check context is both safe, and fast.

To sort out my thought:

 - First of all, OS gathers info about physical location of DIMMs from
   DMI/ACPI/PCI etc., before enabling MCE mechanism.
 - Make a kind of "physical memory location table" on memory buffer,
   to ease mapping a physical address to the location of a DIMM module
   and/or chip which have the memory cell pointed by the address.
    - It would be better to have a well organized table rather than
      having a raw copy of config space etc.
    - Likewise it will also nice if we can map logical processor numbers
      to the location of physical sockets on motherboard.
    - Happy if user can refer the table via sysfs.
    - Allow updating the table if the platform supports hot-plug.
 - Once MCE is enabled, handler can refer the table on memory to
   determine an erroneous device which should be replaced.

This storyline up to here is reasonable and acceptable, I think.

Then now it is clear that the last point where I feel uneasy about is
putting a string into the ring buffer instead of binary bits like index
of location table.  Please use binary (or "binary + string") to tell
the error location to userland.


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/