linux-kernel - Re: vmalloced stacks on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 25 Oct 2014 16:16:23 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Richard Weinberger <richard.weinberger@...il.com>
Cc:	"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: vmalloced stacks on x86_64?

On Sat, Oct 25, 2014 at 3:26 PM, Richard Weinberger
<richard.weinberger@...il.com> wrote:
> On Sat, Oct 25, 2014 at 2:22 AM, Andy Lutomirski <luto@...capital.net> wrote:
>> Is there any good reason not to use vmalloc for x86_64 stacks?
>>
>> The tricky bits I've thought of are:
>>
>>  - On any context switch, we probably need to probe the new stack
>> before switching to it.  That way, if it's going to fault due to an
>> out-of-sync pgd, we still have a stack available to handle the fault.
>>
>>  - Any time we change cr3, we may need to check that the pgd
>> corresponding to rsp is there.  If now, we need to sync it over.
>>
>>  - For simplicity, we probably want all stack ptes to be present all
>> the time.  This is fine; vmalloc already works that way.
>>
>>  - If we overrun the stack, we double-fault.  This should be easy to
>> detect: any double-fault where rsp is less than 20 bytes from the
>> bottom of the stack is a failure to deliver a non-IST exception due to
>>  a stack overflow.  The question is: what do we do if this happens?
>> We could just panic (guaranteed to work).  We could also try to
>> recover by killing the offending task, but that might be a bit
>> challenging, since we're in IST context.  We could do something truly
>> awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
>> return from the double fault.
>>
>> Thoughts?  This shouldn't be all that much code.
>
> FWIW, grsecurity has this already.
> Maybe we can reuse their GRKERNSEC_KSTACKOVERFLOW feature.
> It allocates the kernel stack using vmalloc() and installs guard pages.
>

On brief inspection, grsecurity isn't actually vmallocing the stack.
It seems to be allocating it the normal way and then vmapping it.
That allows it to modify sg_set_buf to work on stack addresses (sigh).

After each switch_mm, it probes the whole kernel stack.  (This seems
dangerous to me -- if the live stack isn't mapped in the new mm, won't
that double-fault?)  I also see no evidence that it probes the new
stack when switching stacks.  I suspect that it only works because it
gets lucky.

If we're worried about on-stack DMA, we could (by config option or
otherwise) allow DMA on a vmalloced stack, at least through the sg
interfaces.  And we could WARN and fix it :)

--Andy

P.S.  I see what appears to be some of my code in grsec.  I feel
entirely justified in taking good bits of grsec and sticking them in
the upstream kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/