[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F504645.5040708@jp.fujitsu.com>
Date: Fri, 02 Mar 2012 13:02:13 +0900
From: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Borislav Petkov <bp@...64.org>,
Mauro Carvalho Chehab <mchehab@...hat.com>,
Ingo Molnar <mingo@...e.hu>,
EDAC devel <linux-edac@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint
(2012/03/02 3:28), Luck, Tony wrote:
>>> My concern is; on Sandy Bridge, is it safe to gather info about the DIMM
>>> location in/from machine check context in a reasonable time span?
>>
>> Well, what amd64_edac does is "buffer" the required lookup info so
>> whenever you get an error, you simply lookup the channel and chip select
>> - all ops which can be done in atomic context.
>
> Yes - we could pre-read all the config space registers ahead of time and
> save them in memory (none of the values should change - except if the platform
> supports hot-plug for memory). Total is only a few Kbytes. Then decode in
> machine check context is both safe, and fast.
To sort out my thought:
- First of all, OS gathers info about physical location of DIMMs from
DMI/ACPI/PCI etc., before enabling MCE mechanism.
- Make a kind of "physical memory location table" on memory buffer,
to ease mapping a physical address to the location of a DIMM module
and/or chip which have the memory cell pointed by the address.
- It would be better to have a well organized table rather than
having a raw copy of config space etc.
- Likewise it will also nice if we can map logical processor numbers
to the location of physical sockets on motherboard.
- Happy if user can refer the table via sysfs.
- Allow updating the table if the platform supports hot-plug.
- Once MCE is enabled, handler can refer the table on memory to
determine an erroneous device which should be replaced.
This storyline up to here is reasonable and acceptable, I think.
Then now it is clear that the last point where I feel uneasy about is
putting a string into the ring buffer instead of binary bits like index
of location table. Please use binary (or "binary + string") to tell
the error location to userland.
Thanks,
H.Seto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists