[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1010260629560.3955@localhost6.localdomain6>
Date: Tue, 26 Oct 2010 06:53:11 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Len Brown <lenb@...nel.org>
cc: Ingo Molnar <mingo@...e.hu>, Huang Ying <ying.huang@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Andi Kleen <andi@...stfloor.org>, linux-acpi@...r.kernel.org,
Borislav Petkov <petkovbb@...glemail.com>,
"H. Peter Anvin" <hpa@...or.com>, Don Zickus <dzickus@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mauro Carvalho Chehab <mchehab@...hat.com>,
Tony Luck <tony.luck@...el.com>
Subject: Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error
Source POLL/IRQ/NMI notification type support
B1;2401;0cLen,
On Mon, 25 Oct 2010, Len Brown wrote:
> > NAKed-by: Ingo Molnar <mingo@...e.hu>
>
> Everybody knows that Linux has a lot to learn about RAS.
>
> I think to catch up, we need to play to Linux's strengths
> of continuous improvement. If we halt patches in this area
> then we could wait forever for the "perfect design".
it's not about perfect design. It's about creating new user space
ABIs. The patches introduce another error reporting user space ABI
with an ad hoc "fits the needs" design.
This is my major point of objection.
I agree that Linux needs improvement on the RAS side, but does this
lack of features justify a new user space ABI which is totally
disconnected to existing RAS facilities ?
No, it does not. It's not our problem that Intel wasted time on
creating another character device driver to report errors to user
space. The time spent to do so would have been sufficient to do a
proper integration into the existing infrastructure.
I would not care at all if these patches would just introduce some
weird in kernel interfaces as we can clean that up at will. But
introducing a new user space ABI is setting the disconnect of RAS
related facilities into stone.
>From Kconfig:
EDAC is designed to report errors in the core system.
These are low-level errors that are reported in the CPU or
supporting chipset or other subsystems:
memory errors, cache errors, PCI errors, thermal throttling, etc..
If unsure, select 'Y'.
So please explain why your error reporting is so different from the
above that it justifies a separate facility. And you better come up
with a real good explanation other than we looked at EDAC and it did
not fit our needs.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists