[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150724110639.GG19282@twins.programming.kicks-ass.net>
Date: Fri, 24 Jul 2015 13:06:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andy Lutomirski <luto@...capital.net>, X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Willy Tarreau <w@....eu>, Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Brian Gerst <brgerst@...il.com>
Subject: Re: Dealing with the NMI mess
On Thu, Jul 23, 2015 at 02:54:54PM -0700, Linus Torvalds wrote:
> On Thu, Jul 23, 2015 at 2:45 PM, Andy Lutomirski <luto@...capital.net> wrote:
> >
> > Or we just re-enable them on the way out of NMI (i.e. the very last
> > thing we do in the NMI handler). I don't want to break regular
> > userspace gdb when perf is running.
>
> I'd really prefer it if we don't touch NMI code in those kinds of
> ways. The NMI code is fragile as hell. All the problems we have with
> it is exactly due to "where is the boundary" issues.
>
> That's why I *don't* want NMI code to do magic crap. Anything that
> says "disable this during this magic window" is broken. The problems
> we've had are exactly about atomicity of the entry/exit conditions,
> and there is no really good way to get them right.
>
> I'd be much happier with a _TIF_USER_WORK_MASK approach exactly
> because it's so *obvious* that it's not a boundary condition.
>
> I dislike the "disable and re-enable dr7 in the NMI handler" exactly
> because it smells like "we can only handle faults in _this_ region".
> It may be true, but it's also what I want us to get away from. I'd
> much rather have the "big picture" be that we can take faults anywhere
> at all (*), and that none of the core code really cares. Then we "fix
> up" user space.
A wee bit something like so?
We need the intermediate self-IPI because NMI/MCE etc do not deal with
TIF flags.
I further cleared all of DR7 in an attempt at reducing the amount of
state tracked. And it doesn't distinguish between kernel/user
watchpoints because the kernel can touch both from !IF.
---
arch/x86/kernel/traps.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8e65d8a9b8db..e8308e9c2b1e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -570,6 +570,33 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
NOKPROBE_SYMBOL(fixup_bad_iret);
#endif
+struct do_debug_state {
+ unsigned long dr7;
+ struct irq_work irq_work;
+ struct callback_head task_work;
+};
+
+static void __debug_irq_trampoline(struct irq_work *work)
+{
+ struct do_debug_state *dds =
+ container_of(work, struct do_debug_state, irq_work);
+
+ task_work_add(current, &dds->task_work, true);
+}
+
+static void __debug_restore_dr7(struct callback_head *work)
+{
+ struct do_debug_state *dds =
+ container_of(work, struct do_debug_state, task_work);
+
+ set_debugreg(dds->dr7, 7);
+}
+
+static DEFINE_PER_CPU(struct do_debug_state, do_debug_state) = {
+ .irq_work = { .func = __debug_irq_trampoline, },
+ .task_work = { .func = __debug_restore_dr7, },
+};
+
/*
* Our handling of the processor debug registers is non-trivial.
* We do not clear them on entry and exit from the kernel. Therefore
@@ -603,6 +630,16 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
ist_enter(regs);
+ if (arch_irqs_disabled_flags(regs->flags)) {
+ struct do_debug_state *dds = this_cpu_ptr(&do_debug_state);
+
+ get_debugreg(dds->dr7, 7);
+ set_debugreg(0, 7);
+ irq_work_queue(&dds->irq_work);
+
+ goto exit;
+ }
+
get_debugreg(dr6, 6);
/* Filter out all the reserved bits which are preset to 1 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists