linux-kernel - Re: Dealing with the NMI mess

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1437755591.5522.0.camel@gmail.com>
Date:	Fri, 24 Jul 2015 09:33:11 -0700
From:	Raymond Jennings <shentino@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Willy Tarreau <w@....eu>, Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Brian Gerst <brgerst@...il.com>
Subject: Re: Dealing with the NMI mess

On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote:
> [moved to a new thread, cc list trimmed]
> 
> Hi all-
> 
> We've considered two approaches to dealing with NMIs:
> 
> 1. Allow nesting.  We know quite well how messy that is.

This might be a stupid question, but

1.  What exactly does the NMI handler handle
2.  Is it possible for the NMI handler to just increment a counter and
return if it nests, and let the outer handler notice and rerun itself.

> 2. Forbid IRET inside NMIs.  Doable but maybe not that pretty.
> 
> We haven't considered:
> 
> 3. Forbid faults (other than MCE) inside NMI.
> 
> Option 3 is almost easy.  There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB.  #DB is easy to
> fix (e.g. with my patches or Peter's patches).
> 
> What if we went all out and forbade page faults in NMI as well.  There
> are two reasons that I can think of that we might page fault inside an
> NMI:
> 
> a) vmalloc fault.  I think Ingo already half-implemented a rework to
> eliminate vmalloc faults entirely.
> 
> b) User memory access faults.
> 
> The reason we access user state in general from an NMI is to allow
> perf to capture enough user stack data to let the tooling backtrace
> back to user space.  What if we did it differently?  Instead of
> capturing this data in NMI context, capture it in
> prepare_exit_to_usermode.  That would let us capture user state
> *correctly*, which we currently can't really do.  There's a
> never-ending series of minor bugs in which we try to guess the user
> register state from NMI context, and it sort of works.  In
> prepare_exit_to_usermode, we really truly know the user state.
> There's a race where an NMI hits during or after
> prepare_exit_to_usermode, but maybe that's okay -- just admit defeat
> in that case and don't show the user state.  (Realistically, without
> CFI data, we're not going to be guaranteed to get the right state
> anyway.)
> 
> To make this work, we'd have to teach NMI-from-userspace to call the
> callback itself.  It would look like:
> 
> prepare_exit_to_usermode() {
>   ...
>   while (blah blah blah) {
>     if (cached_flags & TIF_PERF_CAPTURE_USER_STATE)
>       perf_capture_user_state();
>     ...
>   }
>   ...
> }
> 
> and then, on NMI exit, we'd call perf_capture_user_state directly,
> since we don't want to enable IRQs or do opportunsitic sysret on exit
> from NMI.  (Why not?  Because NMIs are still masked, and we don't want
> to pay for double-IRET to unmask them, so we really want to leave IRQs
> off and IRET straight back to user mode.)
> 
> There's an unavoidable race in which we enter user mode with
> TIF_PERF_CAPTURE_USER_STATE still set.  In principle, we could
> IPI-to-self from the NMI handler to cover that case (mostly -- we
> capture the wrong state if we're on our way to an IRET fault), or we
> could just check on entry if the flag is still set and, if so, admit
> defeat.
> 
> Peter, can this be done without breaking the perf ABI?  If we were
> designing all of this stuff from scratch right now, I'd suggest doing
> it this way, but I'm not sure whether it makes sense to try to
> retrofit it in.
> 
> 
> If we decide to stick with option 2, then I've now convinced myself
> that banning all kernel breakpoints and watchpoints during NMI
> processing is probably for the best.  Maybe we should go one step
> farther and ban all DR7 breakpoints period.  Sure, it will slow down
> perf if there are user breakpoints or watchpoints set, but, having
> looked at the asm, returning from #DB using RET is, while doable,
> distinctly ugly.
> 
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/