[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110517085033.GF22093@elte.hu>
Date: Tue, 17 May 2011 10:50:33 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Don Zickus <dzickus@...hat.com>
Cc: huang ying <huang.ying.caritas@...il.com>,
Huang Ying <ying.huang@...el.com>,
linux-kernel@...r.kernel.org, Andi Kleen <andi@...stfloor.org>,
Robert Richter <robert.richter@....com>,
Andi Kleen <ak@...ux.intel.com>, Borislav Petkov <bp@...en8.de>
Subject: Re: [RFC] x86, NMI, Treat unknown NMI as hardware error
* Don Zickus <dzickus@...hat.com> wrote:
> On Mon, May 16, 2011 at 01:29:34PM +0200, Ingo Molnar wrote:
> > > Interesting. Question though, what do you mean by 'event filtering'. Is
> > > that different then setting 'unknown_nmi_panic' panic on the commandline or
> > > procfs?
> > >
> > > Or are you suggesting something like registering another callback on the
> > > die_chain that looks for DIE_NMIUNKNOWN as the event, swallows them and
> > > implements the policy? That way only on HEST related platforms would
> > > register them while others would keep the default of 'Dazed and confused'
> > > messages?
> >
> > The idea is that "event filters", which are an existing upstream feature and
> > which can be used in rather flexible ways:
> >
> > http://lkml.org/lkml/2011/4/27/660
> >
> > Could be used to trigger non-standard policy action as well - such as to panic
> > the box.
> >
> > This would replace various very limited /debugfs and /sys event filtering hacks
> > (and hardcoded policies) such as arch/x86/kernel/cpu/mcheck/mce-severity.c, and
> > it would allow nonstandard behavior like 'panic the box on unknown NMIs' as
> > well.
> >
> > This could be set by the RAS daemon, and it could be propagated to the kernel
> > boot line as well, where event filter syntax would look like this:
> >
> > events=nmi::unknown"if (reason == 0) panic();"
>
> Wow. ok. I believe that is the most complicated kernel boot param I have
> ever seen. :-) Powerful, no doubt.
It would not have to be typed normally - the defaults would still be sane.
> So this would sorta be a meta-notifier? I guess you are saying platforms
> that implement something like HEST could setup an event like that to trigger
> the behaviour they want on a per-platform basis?
Yeah - or if they dislike the default they could tweak the policy action in a
rather flexible way.
> My only argument against it would be sorta of what Ying complains about is
> that you start to lose track of who is hooked into the NMI. It is one thing
> to search for all the users in the die_notifier to track down who is
> swallowing NMIs. But to look for event users, is going to be harder. Unless
> the events processing has a switch to turn on logging? :-)
Yeah, all such types of filters should be printed during bootup, to make it
really clear what is happening.
We also want all the current state visible readily under /sys/events or
/events.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists