[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTik6ykkg9f+3gpMGa-EGZrYEMVusTC5=MP5usVDM@mail.gmail.com>
Date: Tue, 11 Jan 2011 14:07:55 -0800
From: Duncan Laurie <dlaurie@...gle.com>
To: Mike Waychison <mikew@...gle.com>
Cc: Borislav Petkov <bp@...64.org>, "mingo@...e.hu" <mingo@...e.hu>,
"rdunlap@...otime.net" <rdunlap@...otime.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
Mauro Carvalho Chehab <mchehab@...hat.com>
Subject: Re: [PATCH] x86: Add an option to disable decoding of MCE
On Tue, Jan 11, 2011 at 11:56 AM, Mike Waychison <mikew@...gle.com> wrote:
>
> As for using EDAC, I know we tried using it a few years ago, but have
> since reverted to processing MCE ourselves. I don't know all the
> details as to why, however Duncan Laurie might be able to share more
> (CCed).
>
The short answer is that we needed access to the raw chipset error
registers rather than the parsed and collated counters (or printk
output) that we got with EDAC.
The long answer is that our goals have changed over time. For a long
time our goal with ECC was simply identifying the actual bad DIMM in a
system so it can be replaced. For systems which report ECC via
machine checks this is done by collecting the physical addresses and
converting it with an external utility. For other systems which did
not report memory errors via machine checks EDAC was used to poll and
decode the chipset error registers and export row+channel sysfs
counters because there was not a physical memory address supplied with
each event.
However as the scale of our operations grows so too does the number of
DIMMs that have to be replaced. By gathering and analyzing complete
error data (row, column, bank, rank, etc) we are able to identify if
there is a single chip on a module that is bad, which could
potentially get repaired rather than replacing the whole module.
Gathering this data is easy enough with machine checks since you have
the physical address on which it occurred and doing the translation of
that address into a particular DIMM ends up providing the other
relevant geometry information along the way. It also has a convenient
interface via mcelog that makes it easy to consume the event and do
the translation in userspace (as long as you run with a high mce
tolerance level) and has other advantages of being able to develop and
maintain the utility outside of our internal kernel release schedule.
For platforms that were reporting counters via EDAC there are several
values that need to be read from different chipset registers and
exported. Because parsing printk output for this data is messy we
decided that rather than invent another mcelog type interface (or
subvert an existing interface) we might as well do the polling for
events in userspace and simply save all the data we care about at that
point.
As has been noted this requires us to have intimate knowledge of the
hardware that we're running on. It isn't an approach that I would
recommend to many and your efforts to do decoding of errors with EDAC
is something that most people will benefit from.
-duncan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists