[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <02B220A1-36C9-449C-ABB0-89A3DC7AEDF2@ludd.ltu.se>
Date: Tue, 15 Jun 2010 20:38:58 +0200
From: Nils Carlson <nils.carlson@...d.ltu.se>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Andi Kleen <andi@...stfloor.org>, Doug Thompson <norsk5@...oo.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Ingo Molnar <mingo@...e.hu>, Borislav Petkov <bp@...64.org>,
Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
Mauro Carvalho Chehab <mchehab@...hat.com>,
"Young, Brent" <brent.young@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"bluesmoke-devel@...ts.sourceforge.net"
<bluesmoke-devel@...ts.sourceforge.net>,
Doug Thompson <dougthompson@...ssion.com>,
Joe Perches <joe@...ches.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linux Edac Mailing List <linux-edac@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Matt Domsch <Matt_Domsch@...l.com>,
Nils Carlson <nils.carlson@...csson.com>
Subject: Re: Hardware Error Kernel Mini-Summit
On Jun 15, 2010, at 8:15 PM, Luck, Tony wrote:
>> Could we come up with some plan that doesn't involve
>> trusting to the goodwill (and competence) of BIOS writes?
>
> That would be nice - but there already exists a platform
> (Xeon-7500 series a.k.a. Nehalem-EX) where the hardware
> chipset registers that you would need to do your own
> memory topology reverse engineering in Linux are only
> accessible to SMM level code. I've finally come to the
> conclusion that an EDAC style driver just isn't possible
> for this set of systems.
Yes, I'm dreading the day they come to me telling me that
they've got one of those. On the one end you have hardware
people who love to put functionality there, and then you
have applications that have real-time requirements to
whom you have to explain that the latest and greatest
processor is broken for their purposes.
One day I'll use this as an excuse to migrate everyone to
PPC where people know that a bootloader is a bootloader.
But grudges against BIOS's aside, I don't know what to do
about Nehalem-EX systems. I guess at that point we really
are at the mercy of BIOS writers.
>
>> I personally really like the device tree compiler for PowerPC.
>> It allows you to be explicit about what you have. Not for everyone,
>> but maybe there could be some way to apply the same principle? Maybe
>> some way of loading modules with parameters or configuring your setup
>> from sysfs?
>
> Even when the chip set registers are accessible, it can be very
> complex to do this for the general case (think of boards that
> support arbitrary mixing of different size/speed DIMMs - the
> BIOS may have done some interesting somersaults while computing
> which interleaving modes to use).
>
> Even more complex on high end systems when BIOS may handle row
> sparing transparently to the OS. Memory mirroring is also
> becoming fashionable - how can EDAC represent this (when
> the h/w view of the memory doesn't match the OS view)?
>
Difficult questions. But at some point I wonder who will be buying
systems where finding out which DIMM is broken is so complex
that it requires a masters degree.
/Nils
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists