[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F170F3DA7@ORSMSX104.amr.corp.intel.com>
Date: Wed, 25 Apr 2012 18:32:21 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Mauro Carvalho Chehab <mchehab@...hat.com>,
Borislav Petkov <bp@...64.org>
CC: Linux Edac Mailing List <linux-edac@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Doug Thompson <norsk5@...oo.com>
Subject: RE: [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic
layers
> See the driver: the only useful information provided by the MCA log is
> that an error happened, their physical address, and the type of the
> error. Unlikely the Nehalem MCA, the MCE_MISC registers won't point to the
> DIMM in the error.
There's a bit more information in the MCA log than just the physical address:
The cpu number that finds the data in its bank will provide socket information.
[/proc/cpuinfo maps logical cpu numbers to "physical id"]
Low order bits of the MCi_STATUS register will give the channel. See the SDM.
So the only missing information from the MCA log is which DIMM within
the channel. I.e. we can pin the fault to a group of either two or
three DIMMs depending on how many DIMMS/channel the motherboard supports.
If you only have one DIMM per channel populated than socket/channel is
sufficient to identify the DIMM.
[We also don't have any intra-DIMM information for those customers who
would like to diagnose the device on the DIMM, or which bits within
the cache line had the error]
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists