linux-kernel - Re: [RFC 2/2] x86_64: expand kernel stack to 16K

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5445E82F.2080805@amacapital.net>
Date:	Mon, 20 Oct 2014 21:59:27 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Dave Jones <davej@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC 2/2] x86_64: expand kernel stack to 16K

On 10/20/2014 07:00 PM, Dave Jones wrote:
> On Fri, May 30, 2014 at 08:41:00AM -0700, Linus Torvalds wrote:
>  > On Fri, May 30, 2014 at 8:25 AM, H. Peter Anvin <hpa@...or.com> wrote:
>  > >
>  > > If we removed struct thread_info from the stack allocation then one
>  > > could do a guard page below the stack.  Of course, we'd have to use IST
>  > > for #PF in that case, which makes it a non-production option.

Why is thread_info in the stack allocation anyway?  Every time I look at
the entry asm, one (minor) thing that contributes to general
brain-hurtingness / sense of horrified awe is the incomprehensible (to
me) split between task_struct and thread_info.

struct thread_info is at the bottom of the stack, right?  If we don't
want to merge it into task_struct, couldn't we stick it at the top of
the stack instead?  Anything that can overwrite the *top* of the stack
gives trivial user-controlled CPL0 execution regardless.

>  > 
>  > We could just have the guard page in between the stack and the
>  > thread_info, take a double fault, and then just map it back in on
>  > double fault.
>  > 
>  > That would give us 8kB of "normal" stack, with a very loud fault - and
>  > then an extra 7kB or so of stack (whatever the size of thread-info is)
>  > - after the first time it traps.
>  > 
>  > That said, it's still likely a non-production option due to the page
>  > table games we'd have to play at fork/clone time.

What's wrong with vmalloc?  Doesn't it already have guard pages?

(Also, we have a shiny hardware dirty bit, so we could relatively
cheaply check whether we're near the limit without any weird
#PF-in-weird-context issues.)

Also, muahaha, I've infected more people with the crazy idea that
intentional double-faults are okay.  Suckers!  Soon I'll have Linux
returning from interrupts with lret!  (IIRC Windows used to do
intentional *triple* faults on context switches, so this should be
considered entirely sensible.)

> 
> [thread necrophilia]
> 
> So digging this back up, it occurs to me that after we bumped to 16K,
> we never did anything like the debug stuff you suggested here.
> 
> The reason I'm bringing this up, is that the last few weeks, I've been
> seeing things like..
> 
> [27871.793753] trinity-c386 (28793) used greatest stack depth: 7728 bytes left
> 
> So we're now eating past that first 8KB in some situations.
> 
> Do we care ? Or shall we only start worrying if it gets even deeper ?

I would *love* to have an immediate, loud failure when we overrun the
stack.  This will unavoidably increase the number of TLB misses, but
that probably isn't so bad.

--Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/