linux-kernel - Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 19 Nov 2014 16:46:29 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>, Borislav Petkov <bp@...en8.de>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in
 schedule and __might_sleep

On Wed, Nov 19, 2014 at 4:37 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Wed, Nov 19, 2014 at 4:13 PM, Andy Lutomirski <luto@...capital.net> wrote:
>>
>> No drugs, just imprecision.  This series doesn't change NMI handling
>> at all.  It only changes machine_check int3, debug, and stack_segment.
>> (Why is #SS using IST stacks anyway?)
>
> .. ok, we were talking about adding an explicit preemption count to
> nmi, and then you wanted to make that conditional, that kind of
> freaked me out.

I guess I jumped around in the conversation a bit...

>
>> So my point stands: if machine_check is going to be conditionally
>> atomic, then that condition needs to be expressed somewhere.
>
> I'd still prefer to keep that knowledge in one place, rather than
> adding *another* completely ad-hoc thing in addition to what we
> already have.
>
> Also, I really don't think it should be about the particular stack
> you're using. Sure, if a debug fault happens in user space, the fault
> handler could sleep if it runs on the regular stack, but our
> "might_sleep()" are about catching things that *could* be problematic,
> even if the sleep never happens. And so, might_sleep() _should_
> actually trigger, even if it's not using the IST stack, because *if*
> the debug exception happened in kernel space, then we should warn.
>
> So I'd actually *prefer* to have special hacks that perhaps then
> "undo" the preemption count if the code expressly tests for "did this
> happen in user space, then I know I'm safe". But then it's an
> *explicit* thing, not something that just magically works because
> nobody even thought about it, and the trap happened in user space.
>
> See the argument? I'd *rather* see code like
>
>    /* Magic */
>    if (user_mode(regs)) {
>        .. verify that we're using the normal kernel stack
>        .. enable interrupts, enable preemption
>        .. this is the explicit special case and it is aware
>        .. of being special
>    }
>
> even if on the face of it it looks hacky. But an *explicit* hack is
> preferable to something that just "happens" to work only for the
> user-mode case.

So we'd do, in do_machine_check:

irq_enter();

do atomic stuff;

ist_stop_being_atomic(regs);
local_irq_enable();
...
local_irq_disable();
ist_start_being_atomic_again();

irq_exit();

and we'd have something like:

void ist_stop_being_atomic(struct pt_regs *regs)
{
  BUG_ON(!user_mode_vm(regs));
  --irq_count;
}

I'm very hesitant to use irq_enter for this, though.  I think we want
just the irq_count part.  Maybe ist_enter() and ist_exit()?  I think
that we really don't want to go anywhere near the accounting stuff in
irq_enter from an IST handler if !user_mode_vm(regs).  Doing it from
asm is somewhat less error prone, although I guess we already rely on
the IDT entries themselves being in sync with the paranoid idtentry
setting.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/