lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 13 May 2011 21:17:13 +0800
From:	huang ying <huang.ying.caritas@...il.com>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Huang Ying <ying.huang@...el.com>, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org, Andi Kleen <andi@...stfloor.org>,
	Robert Richter <robert.richter@....com>,
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC] x86, NMI, Treat unknown NMI as hardware error

Hi, Don,

On Fri, May 13, 2011 at 8:45 PM, Don Zickus <dzickus@...hat.com> wrote:
> On Fri, May 13, 2011 at 04:23:38PM +0800, Huang Ying wrote:
>> In general, unknown NMI is used by hardware and firmware to notify
>> fatal hardware errors to OS. So the Linux should treat unknown NMI as
>> hardware error and go panic upon unknown NMI for better error
>> containment.
>
> I have a couple of concerns about this patch.  One I don't think BIOSes
> are ready for this.  I have Intel Westmere boxes that say they have a
> valid HEST, GHES, and EINJ table, but when I inject an error there is no
> GHES record.  This leaves me with an unknown NMI and panic.  Yeah, it is a
> BIOS bug I guess, but I think vendors are going to be slow fixing all this
> stuff (my Nehalem box is in even worse shape with this stuff).

Although there is no GHES record, I think the Westmere box behavior is
acceptable, an unknown NMI is used by BIOS to notify hardware error,
this is what we want to deal with in this patch.

> Also, is there any known issues with x86_64 platforms with bad NMIs?  RHEL
> has had unknown NMI's panic on x86_64 since x86_64 first came out, I don't
> recall any exceptions we had to add to handle 'quirky' hardware.
>
> Then for the i686 case, because the 'quirky' hardware is so old, can't we
> just leave it a kernel config option to switch between using a 'printk'
> vs. a 'panic'?  Or even a kernel command line option.
>
> I figure these 'quirky' hardware machines are more the exception nowdays,
> do we really need to add code to whitelist machines?
>
> Granted I am not familiar enough with the quirky hardware (in fact I don't
> think I have seen any mainly because I haven't been around long enough).
> Most cases I see when trolling through the fedora bugzilla list for
> unknown NMIs, is just bad firmware or acpi power configurations.
>
> Just wondering if we could just simplify the patch somehow with better
> assumptions.

So there is still unknown NMIs on real hardware now. I am afraid turn
on panic on unknown NMI by default may be not acceptable for someone.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ