lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 22 Oct 2010 11:24:02 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Huang Ying <ying.huang@...el.com>, Ingo Molnar <mingo@...e.hu>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Robert Richter <robert.richter@....com>,
	"peterz@...radead.org" <peterz@...radead.org>
Subject: Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error

On Thu, Oct 21, 2010 at 09:49:55PM -0400, Don Zickus wrote:
> After re-reading Huang's patch, I am starting to understand what you mean
> by broken hardware.  Basically you are trying to distinguish between
> legacy systems that were 'broken' in the sense they would randomly send
> uknown NMIs for no good reason, hence the 'Dazed and confused' messages
> and hardware errors on more modern systems that say, 'Hardware error,
> panicing check your BIOS for more info' (or whatever).

Yes that's it.

Unfortunately there are some cases where the BIOS lost it either,
so the fallback has to be panic (at least for the modern boxes)

> 
> So Huang's patch was sort of acting like a switch.  On legacy systems use
> 'Dazed and confused' for unknown NMIs.  Whereas on whitelisted modern
> systems use a more relavant 'Check BIOS for error' message.  Is that
> right?

Yes.

> > I don't think you need to worry about a lot more hardware NMI sources.
> 
> Well until those machines dominate the marketplace, I'm stuck supporting
> those pre-Nahelam boxes with customers that committed to 10 years with
> last year's technology.  ;-)

I should clarify that the NMI model I described long predates Nehalem.
If you assume 3-5 years deprecation cycles on servers it should be pretty
much universal in this space. 

The HEDT detection was a proposed way to detect that, because most
of these systems should have HEDT.

The older machines still need to be supported, but it's ok to
just behave the same as today on them, no need for great improvements
here.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ