lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1ipi9prrl.fsf@fess.ebiederm.org>
Date:	Mon, 12 Mar 2012 12:02:06 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Fernando Luis Vázquez Cao 
	<fernando@....ntt.co.jp>, "H. Peter Anvin" <hpa@...or.com>,
	Don Zickus <dzickus@...hat.com>,
	linux-tip-commits@...r.kernel.org, torvalds@...ux-foundation.org,
	kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
	mingo@...hat.com, tglx@...utronix.de, mingo@...e.hu,
	Yinghai Lu <yinghai@...nel.org>, akpm@...ux-foundation.org
Subject: Re: [PATCH 1/2] boot: ignore early NMIs

Vivek Goyal <vgoyal@...hat.com> writes:

> On Mon, Mar 12, 2012 at 03:14:20PM +0900, Fernando Luis Vázquez Cao wrote:
>
> [..]
>> The thing is that we want to avoid playing with hardware in the kdump
>> reboot patch when we can avoid it, the premise being that it cannot
>> be accessed without risking a lockup or worse (as the deadlock accessing
>> the I/O APIC showed).
>
> I think there needs to be a limit to being paranoid. On one hand people
> want to run panic notifiers, all the kmsg_dump() hooks in panic path, and
> on the other hand we are afraid of even disabling LAPIC.

And the kmsg_dump code and the panic notifiers aren't being run.  Having
seen some of their failure modes being patched up recently (Adding and
removing sysfs files!!!!) I'm very comfortable with the level of
paranoia.

It has been proven time and time again that the more you do in the
failing kernel that the greater your likely-hood of not getting your
failure information out.

> I personally think that disabling LAPIC is reasonably practical solution
> to the problem until and unless somebody shows that it deadlocks
> easily.

Disabling NMI generation in the LAPIC is fine, and for the short term
I don't even have a problem with disabling the entire LAPIC as all of
our platforms seem to have code for completely reprogramming it.

At the same time there have been cases like the i8259 routed through
the ExtInt pin of the lapci that we haven't been given programming
information about and that if we want to work we should avoid touching.

Furthermore we have two reported cases of people experiencing real NMIs
on the kdump path.  So we have to assume the presence of the CMOS nmi
disable as well if we are going to unequivocally disable NMIs.

Given the variety of x86 hardware today and the growing variety of x86
hardware tomorrow we are going to be fixing this until we can actually
handle the NMIs.  Hardware designers are unfortunately creative enough
that we aren't going to think of everything.  Given that it is has taken
us almost a decade to realize that there actually is a real world
problem  I'm not too keen on a solution that is just good enough to
fix a small problem.

I would love it if x86 had an architectural NMI off switch but with
Intel pushing EFI and the removal of the cmos clock x86 no longer
has an always available NMI off switch.

Furthermore handling of NMI is not hard it is just a little tricky,
to test.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ