[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100927133816.GP13563@erda.amd.com>
Date: Mon, 27 Sep 2010 15:38:16 +0200
From: Robert Richter <robert.richter@....com>
To: huang ying <huang.ying.caritas@...il.com>
CC: Huang Ying <ying.huang@...el.com>, Don Zickus <dzickus@...hat.com>,
Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH -v2 6/7] x86, NMI, Add support to notify hardware error
with unknown NMI
On 27.09.10 08:47:53, huang ying wrote:
> >> arch/x86/kernel/hwerr.c | 55 +++++++++++++++++++++++++++++++++++++++++++++
> >
> > Instead of creating this file the code should be implemented in
> >
> > arch/x86/kernel/cpu/intel.c
> >
> > Similar AMD NB code is implemented in amd.c and k8.c.
>
> Why? This file is not vendor specific.
No, it only implements an Intel specific PCI device, nothing else.
> >> +late_initcall(check_unknown_nmi_for_hwerr);
> >
> > Maybe you can use early pci functions like read_pci_config() to avoid
> > late init.
>
> I don't think late init is a big issue. Hardware error is rare after all.
Just want to let you know this as an option.
> >> --- a/arch/x86/kernel/traps.c
> >> +++ b/arch/x86/kernel/traps.c
> >> @@ -83,6 +83,8 @@ EXPORT_SYMBOL_GPL(used_vectors);
> >>
> >> static int ignore_nmis;
> >>
> >> +int unknown_nmi_for_hwerr;
> >
> > If it is an nmi for hwerr, it is no longer an unknown nmi. So we
> > should drop 'unknow' in the naming.
>
> I think unkown NMI is the one we can not identify the source.
> Something like anonymous.
>
> >> +
> >> /*
> >> * Prevent NMI reason port (0x61) being accessed simultaneously, can
> >> * only be used in NMI handler.
> >> @@ -360,6 +362,14 @@ io_check_error(unsigned char reason, str
> >> static notrace __kprobes void
> >> unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> >> {
> >> + /*
> >> + * On some platforms, hardware errors may be notified via
> >> + * unknown NMI
> >> + */
> >> + if (unknown_nmi_for_hwerr)
> >> + panic("NMI for hardware error without error record: "
> >> + "Not continuing");
> >> +
> >
> > Instead of checking this flag you should implement and register an nmi
> > handler for this case.
>
> I think explicit function calls have better readability than notifier chains.
What is different to unknown_nmi() then?
So no, in your case you want to catch unknown nmis for a certain
hardware and then throw a panic. This should be clearly implemented in
a separate handler for this piece of hardware.
We want to cleanup this code and throw out all hardware specific
snippets, and not introduce new special cases here.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists