[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r5laxiap.fsf@basil.nowhere.org>
Date: Tue, 18 May 2010 00:41:34 +0200
From: Andi Kleen <andi@...stfloor.org>
To: Mauro Carvalho Chehab <mchehab@...hat.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
bluesmoke-devel@...ts.sourceforge.net,
Linux Edac Mailing List <linux-edac@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Ben Woodard <woodard@...hat.com>,
Matt Domsch <Matt_Domsch@...l.com>,
Doug Thompson <dougthompson@...ssion.com>,
Borislav Petkov <bp@...64.org>,
Tony Luck <tony.luck@...el.com>,
Brent Young <brent.young@...el.com>
Subject: Re: Hardware Error Kernel Mini-Summit
Mauro Carvalho Chehab <mchehab@...hat.com> writes:
>
> There is an immediate need for error reporting on NHM-EP class
> systems.
Just for the innocent readers who might be mislead by this:
Nehalem-EP DIMM error accounting already works fine today using
mcelog for most cases, including RHEL5.5 (with some limits)
and RHEL6beta with no additional changes needed.
In RHEL6 the daemon does the accounting and the client reports the errors
separated for each DIMM and separated in uc and ce. In RHEL5
the information is in a log file and can be gotten from there.
In addition the daemon supports various advanced RAS features including
predictive bad page offlining and various threshold triggers.
> In the specific case of Nehalem-EX, it seems that the low level driver
> won't be able to use direct access to the memory controller registers,
> since the uncore now uses a register index/value pair to read or write
> from the memory controller. The same pair is also used by BIOS to control
> the hardware. With this design, race conditions between BIOS and the OS
> may happen, So, even reading data from the Memory Controller registers
> is not possible. So, it will need to use some logic to communicate via
> BIOS, probably via ACPI 4.0 APEI.
Already done too, see
http://permalink.gmane.org/gmane.linux.acpi.devel/45743
However the interface won't give you the topology you're asking
for, just the errors.
-Andi
--
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists