linux-kernel - Re: [PATCH 1/2] kernel/fork: Add support for stack-end guard page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Zai+4VwNXS_a417M2m0DbtNhjTVdYga178ZDkvNnP4CQ@mail.gmail.com>
Date:   Thu, 25 Jul 2019 09:53:08 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Marco Elver <elver@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Alexander Potapenko <glider@...gle.com>,
        Andrey Konovalov <andreyknvl@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        kasan-dev <kasan-dev@...glegroups.com>,
        Daniel Axtens <dja@...ens.net>
Subject: Re: [PATCH 1/2] kernel/fork: Add support for stack-end guard page

On Wed, Jul 24, 2019 at 1:21 PM Mark Rutland <mark.rutland@....com> wrote:
>
> On Wed, Jul 24, 2019 at 11:11:49AM +0200, Dmitry Vyukov wrote:
> > On Tue, Jul 23, 2019 at 6:41 PM Mark Rutland <mark.rutland@....com> wrote:
> > >
> > > On Fri, Jul 19, 2019 at 03:28:17PM +0200, Marco Elver wrote:
> > > > Enabling STACK_GUARD_PAGE helps catching kernel stack overflows immediately
> > > > rather than causing difficult-to-diagnose corruption. Note that, unlike
> > > > virtually-mapped kernel stacks, this will effectively waste an entire page of
> > > > memory; however, this feature may provide extra protection in cases that cannot
> > > > use virtually-mapped kernel stacks, at the cost of a page.
> > > >
> > > > The motivation for this patch is that KASAN cannot use virtually-mapped kernel
> > > > stacks to detect stack overflows. An alternative would be implementing support
> > > > for vmapped stacks in KASAN, but would add significant extra complexity.
> > >
> > > Do we have an idea as to how much additional complexity?
> >
> > We would need to map/unmap shadow for vmalloc region on stack
> > allocation/deallocation. We may need to track shadow pages that cover
> > both stack and an unused memory, or 2 different stacks, which are
> > mapped/unmapped at different times. This may have some concurrency
> > concerns.  Not sure what about page tables for other CPU, I've seen
> > some code that updates pages tables for vmalloc region lazily on page
> > faults. Not sure what about TLBs. Probably also some problems that I
> > can't thought about now.
>
> Ok. So this looks big, we this hasn't been prototyped, so we don't have
> a concrete idea. I agree that concurrency is likely to be painful. :)
>
> [...]
>
> > > > diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
> > > > index 288b065955b7..b218b5713c02 100644
> > > > --- a/arch/x86/include/asm/page_64_types.h
> > > > +++ b/arch/x86/include/asm/page_64_types.h
> > > > @@ -12,8 +12,14 @@
> > > >  #define KASAN_STACK_ORDER 0
> > > >  #endif
> > > >
> > > > +#ifdef CONFIG_STACK_GUARD_PAGE
> > > > +#define STACK_GUARD_SIZE PAGE_SIZE
> > > > +#else
> > > > +#define STACK_GUARD_SIZE 0
> > > > +#endif
> > > > +
> > > >  #define THREAD_SIZE_ORDER    (2 + KASAN_STACK_ORDER)
> > > > -#define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
> > > > +#define THREAD_SIZE  ((PAGE_SIZE << THREAD_SIZE_ORDER) - STACK_GUARD_SIZE)
> > >
> > > I'm pretty sure that common code relies on THREAD_SIZE being a
> > > power-of-two. I also know that if we wanted to enable this on arm64 that
> > > would very likely be a requirement.
> > >
> > > For example, in kernel/trace/trace_stack.c we have:
> > >
> > > | this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
> > >
> > > ... and INIT_TASK_DATA() allocates the initial task stack using
> > > THREAD_SIZE, so that may require special care, as it might not be sized
> > > or aligned as you expect.
> >
> > We've built it, booted it, stressed it, everything looked fine... that
> > should have been a build failure.
>
> I think it's been an implicit assumption for so long that no-one saw the need
> for built-time assertions where they depend on it.
>
> I also suspect that in practice there are paths that you won't have
> stressed in your environment, e.g. in the ACPI wakeup path where we end
> up calling:
>
> /* Unpoison the stack for the current task beyond a watermark sp value. */
> asmlinkage void kasan_unpoison_task_stack_below(const void *watermark)
> {
>         /*
>          * Calculate the task stack base address.  Avoid using 'current'
>          * because this function is called by early resume code which hasn't
>          * yet set up the percpu register (%gs).
>          */
>         void *base = (void *)((unsigned long)watermark & ~(THREAD_SIZE - 1));
>
>         kasan_unpoison_shadow(base, watermark - base);
> }
>
> > Is it a property that we need to preserve? Or we could fix the uses
> > that assume power-of-2?
>
> Generally, I think that those can be fixed up. Someone just needs to dig
> through how THREAD_SIZE and THREAD_SIZE_ORDER are used to generate or
> manipulate addresses.
>
> For local-task stuff, I think it's easy to rewrite in terms of
> task_stack_page(), but I'm not entirely sure what we'd do for cases
> where we look at another task, e.g.
>
> static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
>                                 struct pid *pid, struct task_struct *task)
> {
>         unsigned long prev_depth = THREAD_SIZE -
>                                 (task->prev_lowest_stack & (THREAD_SIZE - 1));
>         unsigned long depth = THREAD_SIZE -
>                                 (task->lowest_stack & (THREAD_SIZE - 1));
>
>         seq_printf(m, "previous stack depth: %lu\nstack depth: %lu\n",
>                                                         prev_depth, depth);
>         return 0;
> }
>
> ... as I'm not sure of the lifetime of task->stack relative to task. I
> know that with THREAD_INFO_IN_TASK the stack can be freed while the task
> is still live.
>
> Thanks,
> Mark.

FTR, Daniel just mailed:

[PATCH 0/3] kasan: support backing vmalloc space with real shadow memory
https://groups.google.com/forum/#!topic/kasan-dev/YuwLGJYPB4I
Which presumably will supersede this.