linux-kernel - Re: Dealing with the NMI mess

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150723172050.1e1821e1@gandalf.local.home>
Date:	Thu, 23 Jul 2015 17:20:50 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Willy Tarreau <w@....eu>, Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Brian Gerst <brgerst@...il.com>
Subject: Re: Dealing with the NMI mess

On Thu, 23 Jul 2015 13:21:16 -0700
Andy Lutomirski <luto@...capital.net> wrote:

> 3. Forbid faults (other than MCE) inside NMI.
> 
> Option 3 is almost easy.  There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB.  #DB is easy to
> fix (e.g. with my patches or Peter's patches).

What about int3? Which is needed to make ftrace work. This was a
requirement to get rid of stomp-machine when updating ftrace functions,
as well as the rational for doing the whole NMI nesting work in the
first place.

> 
> What if we went all out and forbade page faults in NMI as well.  There
> are two reasons that I can think of that we might page fault inside an
> NMI:
> 
> a) vmalloc fault.  I think Ingo already half-implemented a rework to
> eliminate vmalloc faults entirely.
> 
> b) User memory access faults.

c) stack tracing faults

I would have NMIs debug deadlocks with printing stack traces. The stack
tracer can page fault, and before the NMI nesting code, while debugging
machines, these stack dumps would randomly reboot the box. While
writing the NMI nesting code I realized why those reboots happened, and
that was due to the stack trace faulting, and the printk from NMI was
slow enough to have another NMI go off and stomp over the outer NMIs
stack. Which lead to triple faults and such.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/