linux-kernel - Re: Dealing with the NMI mess

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 23 Jul 2015 14:50:11 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Willy Tarreau <w@....eu>, Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Brian Gerst <brgerst@...il.com>
Subject: Re: Dealing with the NMI mess

On Thu, Jul 23, 2015 at 2:48 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Thu, Jul 23, 2015 at 2:31 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
>>
>> Let me get this straight. The idea is in the #DB handler to detect that
>> it was triggered in NMI context, and if so, simply disarm that
>> breakpoint permanently, right?
>
> No, for simplicity, I'd make it cover not just NMI code, but any
> "kernel code with interrupts disabled".
>
> Because that's the test we'd use for "use ret instead of iret".
>
> And that wider test is exactly because it's so damn hard to get the
> exact instruction boundaries right. Let's *not* go down the path
> (again) of having to get the whole %rip range and "magic stack pointer
> values" etc.
>
> Make it simple and completely unambiguous. The rule really would be:
>
>  - if we return to kernel space and interrupts are disabled, we will
> use "ret" rather than "iret"
>
>    Hard rule. Simple. Straightforward. No random %rip values. No
> random %rsp values. NO CRAP.
>
>  - but because we use "ret" rather than "iret" we can't get RF
> semantics, it means that #DB is special. RF is supposed to make us
> make forward progress
>
>    So for that reason, #DB just says "if the breakpoint happened
> during that interrupts-ff reghion, I will clear %dr7 to guarantee
> forward progress"

What if we relax it slightly: "if the breakpoint happened during that
interrupts-off region, I will clear all *kernel breakpoints* in %dr7
to guarantee forward progress"?

Watchpoints don't need RF to make forward progress, and, by leaving
watchpoints alone, we avoid breaking gdb.

>
> So those would be the two main rules. Very simple, and avoiding all nasty cases.
>
> Now, I'd be willing to then hide the "oops, we clear dr7 very
> agrressively" issue by having a few additional _heuristics_. But I
> call them "heuristics" because unlike the current NMI nesting games,
> they aren't about core stability. They are about "ok, maybe somebody
> wants to trigger those faults, and we'll be _nice_ and try to make it
> easy for them", but nothing more.
>
> So for example, if that "#DB clears %dr7" happened, it sounds easy to
> set _TIF_USER_WORK_MASK, and just force %dr7 to be re-loaded from a
> cached value, so that if we disabled things because of some user stack
> trace access, it will be re-enabled by the time we return to user
> space. I think that sounds reasonable, but it's not something the core
> low-level entry x86 assembly code needs to even care about. It's not
> that level of "core", it's just being polite.

Once we limit it to instruction breakpoints, I don't think re-enabling
before returning to userspace matters.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/