linux-kernel - Re: [PATCH RFC] NMI Re-introduce un[set]_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 04 Sep 2008 17:52:17 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Prarit Bhargava <prarit@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, arozansk@...hat.com,
	dzickus@...hat.com, Thomas.Mingarelli@...com, ak@...ux.intel.com,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Maciej W. Rozycki" <macro@...ux-mips.org>
Subject: Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback

Ingo Molnar <mingo@...e.hu> writes:
>
> i'd much rather attack this general problem from this angle:
>
>   static inline unsigned char get_nmi_reason(void)
>   {
>           return inb(0x61);
>   }
>
> that port 61H read is both arcane (on modern chipsets) and broken on 
> multiple levels.

Yes it is. I did some datasheet reading recently and unfortunately
there is no really standardized better way. So the only replacement
would be to have chipset specific NMI drivers that know 
the particular registers of the chipset.

 It's racy and SMP unsafe to begin with, if there's any 
> mixture of intentional cross-CPU or CPU self-generated NMIs mixed with 
> chipset generated NMIs.
>
> One possible approach would be to get rid of it, and to perhaps register 

Removing the IO port accesses by default would be a good idea
I agree. They are hardly useful for anything on modern systems.

But you still need some way to catch the chipset NMIs
and give some indication of the problem.

The way so far was to ask all the other sources (software NMIs
in memory flags, perfmon IPIs check perf ctrs, etc.) first 
and if it's none of them assume it's a chipset NMI
(or NMI button NMI if the sysctl is set).

Then if there's a chipset specific NMI driver it could
also check if the chipset raised it. That would be a possible
solution for HP -- they would need to implement such a driver
for their systems with the special watchdog.

Yes that's racy but the poor hardware support doesn't unfortunately 
leave much wiggling room to do better.

> a low-priority die notifier on systems where we know port 61 
> reads+writes to be safe and desired. Modern systems will emit MCEs in 
> most cases anyway, not NMIs.

The chipsets will still trigger NMIs (depending on their
configuration) -- e.g. on some PCI or internal errors -- they cannot
trigger MCEs directly.  Fortunately it's being replaced with PCI-AER
on PCI-Express, but PCI-X which doesn't do that is still very common
and shipping.

BTW the NMI handlers are also racy, it's not safe 
to call printk in a NMI handler. They really should be taught
to start using mce_log()

-Andi
-- 
ak@...ux.intel.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/