lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Mar 2012 14:43:42 +0900
From:	Fernando Luis Vázquez Cao 
	<fernando@....ntt.co.jp>
To:	"H. Peter Anvin" <hpa@...or.com>
CC:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Don Zickus <dzickus@...hat.com>,
	linux-tip-commits@...r.kernel.org, torvalds@...ux-foundation.org,
	kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
	mingo@...hat.com, tglx@...utronix.de, mingo@...e.hu,
	Yinghai Lu <yinghai@...nel.org>, akpm@...ux-foundation.org,
	vgoyal@...hat.com
Subject: Re: [PATCH 1/2] boot: ignore early NMIs

On 03/10/2012 05:52 AM, H. Peter Anvin wrote:

> Is there a reason to not just simply block these NMIs during the kexec
> sequence?
Ok, some background:

In the reboot path to the kdump kernel we disable local interrupts
and the APICs in native_machine_crash_shutdown() and reset the IDT
in machine_kexec(), which leaves an in valid IDT installed.

However, disabling the I/O APIC involves taking a lock, which in
the event of a crash can is racy and can lead to a deadlock. To
solve this issue Don wrote a patch that left the I/O APICs and
the LAPIC of the crashing CPU untouched in the kdump reboot path,
but this seemed to cause mysterious reboots in some systems.
It turned out that an NMI coming from the perf based hardlockup
detector was causing the system to triple fault. If a NMI happens
to arrive in the window between the invalidation of the IDT in
machine_kexec() and the configuration of the final IDT we will be
in big trouble. In particular, the system will either triple fault
or halt, depending on whether the NMI arrived before or after
installing the early IDT.

To tackle this issue we can either stop the hardlockup detector
or disable the LAPIC (the NMIs needed by x86's hardlockup detector
are generated using performance counters in the LAPIC), leaving
the I/O APICs untouched. The second is simpler and I think it
is the approach Don took to fix this issue in RHEL kernels.

Unfortunately, this is not enough, we are still exposed to external
NMIs not routed through the LAPIC. In other words, we have to make
sure that we always have and IDT that is able to handle NMIs without
seemingly random reboots and lockups. To achieve this goal we need
to fix machine_kexec() and the early IDT handlers. The current patch
set takes care of the latter.

- Fernando

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists