[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110524081434.GA18863@liondog.tnic>
Date: Tue, 24 May 2011 10:14:34 +0200
From: Borislav Petkov <bp@...en8.de>
To: Ingo Molnar <mingo@...e.hu>
Cc: "Luck, Tony" <tony.luck@...el.com>, linux-kernel@...r.kernel.org,
"Huang, Ying" <ying.huang@...el.com>,
Andi Kleen <andi@...stfloor.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mauro Carvalho Chehab <mchehab@...hat.com>
Subject: Re: [RFC 0/9] mce recovery for Sandy Bridge server
On Tue, May 24, 2011 at 05:40:23AM +0200, Ingo Molnar wrote:
> So we *really* want to promote this code to a higher level of abstraction.
> Everyone would benefit from doing that: Intel hardware error handling features
> would be enabled much more richly and i suspect they would also be *used* in a
> much more meaningful way - driving the hw cycle as well.
Absolutely agreed. The RAS architecture should look like this, IMHO:
I. Event collection: #MC handler and pollers, no queueing or buffering crap.
II. Pluggable and extensible filters which are
* per vendor
* configurable from userspace
* easily extensible
* decide whether action should be taken in the kernel or error is non-critical
and should go to RAS daemon
III. Error handling callback(s)
* also extensible
* also per vendor
* also configurable from userspace
Advantages:
* reuse perf code - no need for ad-hoc new buffers and lockless thingies when we
have it all already
* easy code and even hw testing with perf inject or ras inject
** this gives us also the different injection methods per vendor in an unified
way instead of interfaces in /sys or debugfs or mcelog or ...
* keep code design sane instead of letting it needlessly fiddle with
other parts of the kernel
* ...
Now I should better go and put my patches where my mouth is :).
--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists