linux-kernel - Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101022092402.GB10456@basil.fritz.box>
Date:	Fri, 22 Oct 2010 11:24:02 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Huang Ying <ying.huang@...el.com>, Ingo Molnar <mingo@...e.hu>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Robert Richter <robert.richter@....com>,
	"peterz@...radead.org" <peterz@...radead.org>
Subject: Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error

On Thu, Oct 21, 2010 at 09:49:55PM -0400, Don Zickus wrote:
> After re-reading Huang's patch, I am starting to understand what you mean
> by broken hardware.  Basically you are trying to distinguish between
> legacy systems that were 'broken' in the sense they would randomly send
> uknown NMIs for no good reason, hence the 'Dazed and confused' messages
> and hardware errors on more modern systems that say, 'Hardware error,
> panicing check your BIOS for more info' (or whatever).

Yes that's it.

Unfortunately there are some cases where the BIOS lost it either,
so the fallback has to be panic (at least for the modern boxes)

> 
> So Huang's patch was sort of acting like a switch.  On legacy systems use
> 'Dazed and confused' for unknown NMIs.  Whereas on whitelisted modern
> systems use a more relavant 'Check BIOS for error' message.  Is that
> right?

Yes.

> > I don't think you need to worry about a lot more hardware NMI sources.
> 
> Well until those machines dominate the marketplace, I'm stuck supporting
> those pre-Nahelam boxes with customers that committed to 10 years with
> last year's technology.  ;-)

I should clarify that the NMI model I described long predates Nehalem.
If you assume 3-5 years deprecation cycles on servers it should be pretty
much universal in this space. 

The HEDT detection was a proposed way to detect that, because most
of these systems should have HEDT.

The older machines still need to be supported, but it's ok to
just behave the same as today on them, no need for great improvements
here.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/