[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1437755591.5522.0.camel@gmail.com>
Date: Fri, 24 Jul 2015 09:33:11 -0700
From: Raymond Jennings <shentino@...il.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Willy Tarreau <w@....eu>, Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Brian Gerst <brgerst@...il.com>
Subject: Re: Dealing with the NMI mess
On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote:
> [moved to a new thread, cc list trimmed]
>
> Hi all-
>
> We've considered two approaches to dealing with NMIs:
>
> 1. Allow nesting. We know quite well how messy that is.
This might be a stupid question, but
1. What exactly does the NMI handler handle
2. Is it possible for the NMI handler to just increment a counter and
return if it nests, and let the outer handler notice and rerun itself.
> 2. Forbid IRET inside NMIs. Doable but maybe not that pretty.
>
> We haven't considered:
>
> 3. Forbid faults (other than MCE) inside NMI.
>
> Option 3 is almost easy. There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB. #DB is easy to
> fix (e.g. with my patches or Peter's patches).
>
> What if we went all out and forbade page faults in NMI as well. There
> are two reasons that I can think of that we might page fault inside an
> NMI:
>
> a) vmalloc fault. I think Ingo already half-implemented a rework to
> eliminate vmalloc faults entirely.
>
> b) User memory access faults.
>
> The reason we access user state in general from an NMI is to allow
> perf to capture enough user stack data to let the tooling backtrace
> back to user space. What if we did it differently? Instead of
> capturing this data in NMI context, capture it in
> prepare_exit_to_usermode. That would let us capture user state
> *correctly*, which we currently can't really do. There's a
> never-ending series of minor bugs in which we try to guess the user
> register state from NMI context, and it sort of works. In
> prepare_exit_to_usermode, we really truly know the user state.
> There's a race where an NMI hits during or after
> prepare_exit_to_usermode, but maybe that's okay -- just admit defeat
> in that case and don't show the user state. (Realistically, without
> CFI data, we're not going to be guaranteed to get the right state
> anyway.)
>
> To make this work, we'd have to teach NMI-from-userspace to call the
> callback itself. It would look like:
>
> prepare_exit_to_usermode() {
> ...
> while (blah blah blah) {
> if (cached_flags & TIF_PERF_CAPTURE_USER_STATE)
> perf_capture_user_state();
> ...
> }
> ...
> }
>
> and then, on NMI exit, we'd call perf_capture_user_state directly,
> since we don't want to enable IRQs or do opportunsitic sysret on exit
> from NMI. (Why not? Because NMIs are still masked, and we don't want
> to pay for double-IRET to unmask them, so we really want to leave IRQs
> off and IRET straight back to user mode.)
>
> There's an unavoidable race in which we enter user mode with
> TIF_PERF_CAPTURE_USER_STATE still set. In principle, we could
> IPI-to-self from the NMI handler to cover that case (mostly -- we
> capture the wrong state if we're on our way to an IRET fault), or we
> could just check on entry if the flag is still set and, if so, admit
> defeat.
>
> Peter, can this be done without breaking the perf ABI? If we were
> designing all of this stuff from scratch right now, I'd suggest doing
> it this way, but I'm not sure whether it makes sense to try to
> retrofit it in.
>
>
> If we decide to stick with option 2, then I've now convinced myself
> that banning all kernel breakpoints and watchpoints during NMI
> processing is probably for the best. Maybe we should go one step
> farther and ban all DR7 breakpoints period. Sure, it will slow down
> perf if there are user breakpoints or watchpoints set, but, having
> looked at the asm, returning from #DB using RET is, while doable,
> distinctly ugly.
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists