linux-kernel - Re: [RFC 5/6] x86, NMI, Add support to notify hardware error with unknown NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100913141140.GB27371@redhat.com>
Date:	Mon, 13 Sep 2010 10:11:40 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Huang Ying <ying.huang@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>, Ingo Molnar <mingo@...e.hu>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 5/6] x86, NMI, Add support to notify hardware error with
 unknown NMI

On Mon, Sep 13, 2010 at 10:19:49AM +0800, Huang Ying wrote:
> On Sat, 2010-09-11 at 02:40 +0800, Don Zickus wrote:
> > On Fri, Sep 10, 2010 at 06:19:29PM +0200, Andi Kleen wrote:
> > > 
> > > > I am grasping for straws here, but is there a register that APEI/HEST
> > > > can poke to see if it generated the NMI?
> > > 
> > > HEST knows this yes.
> > > 
> > > But this is not about HEST errors, but about those without HEST
> > > handling.
> > 
> > Don't most unknown NMIs fall into the same boat, that they were not being
> > handled properly?
> 
> As far as I know, at least on some platforms, unknown NMIs are used for
> hardware error reporting. They will cause "Blue Screen" in Windows.

Unfortunately, most of the bugzillas I deal with, unkown NMIs are the
result of SERRs.  While you can consider that hardware error reporting,
the easiest way for me to debug those problems currently is to have
reporters run 'lspci -vvv' after the NMI is displayed to figure out who
caused the NMI.

My fear is that panic'ing the box on unknown NMIs on those platforms will
hinder my ability to easily debug those NMIs.

> 
> > On the other hand could you use the die_notifier_chain(DIE_UNKNOWNNMI) for
> > the same purpose and keep the unknown_nmi_error() handler a little
> > cleaner?
> 
> I think explicit function call has better readability than notifier
> chain.

Ok.  What criteria should we establish to determine which functions go on
the notifier chain and which ones can explicitly called?

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/