linux-kernel - Re: [RFC 5/6] x86, NMI, Add support to notify hardware error with unknown NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100913200707.3b31429e@basil.nowhere.org>
Date:	Mon, 13 Sep 2010 20:07:07 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Huang Ying <ying.huang@...el.com>, Ingo Molnar <mingo@...e.hu>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 5/6] x86, NMI, Add support to notify hardware error with
 unknown NMI


> 
> Honestly, I don't think you need much screen real estate.  It would be
> nice when an unknown NMI comes in, if the kernel just pokes around
> the hardware registers and display a summary of what it found.  For
> example,
> 
> The following devices had error bits set in the status registers:
> PCI device x:y.z - STATUS_BIT1 | STATUS_BIT2
> HW device xyz - STATUS_BIT3
> ...

You mean data from the generic PCI config space?

I don't think i would feel comfortable with arbitrary driver callbacks
(the risk of the driver breaking the panic would be high)

But if it's generic if not on the screen it should
be at least in the error serialization data and logged after boot.

At least on PCI-E it may be enough to simply dump all recent AER
data.

> 
> But I guess if we accept the fact that an unknown NMI will panic the
> box, then we can probably be a little more liberal in breaking
> spinlocks and poking around the hardware to display some userful info.

You have to be a bit careful with that, you may caused nested errors
(e.g. machine checks or more NMIs). I suppose this could be checked for
though.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/