[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F19D5C13C@ORSMSX108.amr.corp.intel.com>
Date: Thu, 1 Nov 2012 21:09:07 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Borislav Petkov <bp@...en8.de>,
Mauro Carvalho Chehab <mchehab@...hat.com>
CC: Linux Edac Mailing List <linux-edac@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: [RFC EDAC/GHES] edac: lock module owner to avoid error report
conflicts
> That is correct, unfortunately. That information is not available to
> software in all cases. Maybe APEI could be used for that DIMM location
> mapping through simple tables instead of letting it fumble the error
> handling path.
Not much hope for "simple"[1] tables. There is also a timings issue on
system with rank sparing, memory mirroring etc. ... you need to decode
to the DIMM at the time the error happened. If you wait until later, then
the system may have switched over to the spare rank or mirror ... and then
your decode will point at the new target, rather than the old.
-Tony
[1] Consider a 4 cpu-socket machine with 4 channels per socket and three
DIMMs per channel - so there are 48 sockets on the motherboard. Then
some lab monkey takes a box of random 1, 2, 4, 8 GB DIMMs and fills
most of the sockets. BIOS will somehow make sense out of this and interleave
where it finds matching speeds across pairs/quads of channels (though size
need not match ... if you have a 2G and 4G DIMM you may get interleaving for
the part. then non-interleaved for the "extra" 2G).
Powered by blists - more mailing lists