[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1286845821.7768.150.camel@yhuang-dev>
Date: Tue, 12 Oct 2010 09:10:21 +0800
From: Huang Ying <ying.huang@...el.com>
To: Don Zickus <dzickus@...hat.com>
Cc: Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andi Kleen <andi@...stfloor.org>,
Robert Richter <robert.richter@....com>
Subject: Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error
On Tue, 2010-10-12 at 05:20 +0800, Don Zickus wrote:
> On Sat, Oct 09, 2010 at 02:49:46PM +0800, Huang Ying wrote:
> > In general, unknown NMI is used by hardware and firmware to notify
> > fatal hardware errors to OS. So the Linux should treat unknown NMI as
> > hardware error and go panic upon unknown NMI for better error
> > containment.
> >
> > But there are some broken hardware, which will generate unknown NMI
> > not for hardware error. To support these machines, a white list
> > mechanism is provided to treat unknown NMI as hardware error only on
> > some known working system.
> >
> > These systems are identified via the presentation of APEI HEST or
> > some PCI ID of the host bridge. The PCI ID of host bridge instead of
> > DMI ID is used, so that the checking can be done based on the platform
> > type instead of motherboard. This should be simpler and sufficient.
> >
> > The method to identify the platforms is designed by Andi Kleen.
>
> I don't have any major problems with the other patches in the patch
> series. In fact I would like to get them committed somewhere, so we can
> continue building on them.
Thanks.
> > @@ -366,6 +368,15 @@ unknown_nmi_error(unsigned char reason,
> > if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
> > NOTIFY_STOP)
> > return;
> > + /*
> > + * On some platforms, hardware errors may be notified via
> > + * unknown NMI
> > + */
> > + if (unknown_nmi_as_hwerr)
> > + panic(
> > + "NMI for hardware error without error record: Not continuing\n"
> > + "Please check BIOS/BMC log for further information.");
> > +
> > #ifdef CONFIG_MCA
> > /*
> > * Might actually be able to figure out what the guilty party
>
> The only quirk I have left is the above piece, which is basically a
> philosophy difference with Robert and myself. Where we believe it should
> be on the die_chain and Andi and yourself would like to see it explicitly
> called out.
>
> If we move to a new notifier chain, like we discussed in another thread,
> would you guys be willing to move this into that new notifier chain or is
> your argument still going to stand?
Perhaps I will not move this into that new notifier chain. If you want
to do that, feel free to pick it up and change it as you will.
Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists