[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d334639-feb9-474d-a4ed-6aa6d2afb33d@lucifer.local>
Date: Wed, 9 Oct 2024 15:53:20 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Hugh Dickins <hughd@...gle.com>,
Oleg Nesterov <oleg@...hat.com>, Michal Hocko <mhocko@...e.com>,
Helge Deller <deller@....de>, Vlastimil Babka <vbabka@...e.cz>,
Ben Hutchings <ben@...adent.org.uk>, Willy Tarreau <w@....eu>,
Rik van Riel <riel@...riel.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org,
Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH] mm: Enforce a minimal stack gap even against
inaccessible VMAs
On Tue, Oct 08, 2024 at 04:20:13PM +0200, Jann Horn wrote:
> On Tue, Oct 8, 2024 at 1:40 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> > This is touching mm/mmap.c, please ensure to cc- the reviewers (me and
> > Liam, I have cc'd Liam here) as listed in MAINTAINERS when submitting
> > patches for this file.
>
> Ah, my bad, sorry about that.
>
> > Also this seems like a really speculative 'please discuss' change so should
> > be an RFC imo.
> >
> > On Tue, Oct 08, 2024 at 12:55:39AM +0200, Jann Horn wrote:
> > > As explained in the comment block this change adds, we can't tell what
> > > userspace's intent is when the stack grows towards an inaccessible VMA.
> > >
> > > I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc
> > > that mixes malloc(), pthread creation, and recursion in just the right way
> > > such that the main stack overflows into malloc() arena memory.
> > > (Let me know if you want me to share that.)
> >
> > You are claiming a fixes from 2017 and cc'ing stable on a non-RFC so
> > yeah... we're going to need more than your word for it :) we will want to
> > be really sure this is a thing before we backport to basically every
> > stable kernel.
> >
> > Surely this isn't that hard to demonstrate though? You mmap something
> > PROT_NONE just stack gap below the stack, then intentionally extend stack
> > to it, before mprotect()'ing the PROT_NONE region?
>
> I've attached my test case that demonstrates this using basically only
> malloc, free, pthread_create() and recursion, plus a bunch of ugly
> read-only gunk and synchronization. It assumes that it runs under
> glibc, as a 32-bit x86 binary. Usage:
Thanks!
>
> $ clang -O2 -fstack-check -m32 -o grow-32 grow-32.c -pthread -ggdb &&
> for i in {0..10}; do ./grow-32; done
> corrupted thread_obj2 at depth 190528
> corrupted thread_obj2 at depth 159517
> corrupted thread_obj2 at depth 209777
> corrupted thread_obj2 at depth 200119
> corrupted thread_obj2 at depth 208093
> corrupted thread_obj2 at depth 167705
> corrupted thread_obj2 at depth 234523
> corrupted thread_obj2 at depth 174528
> corrupted thread_obj2 at depth 223823
> corrupted thread_obj2 at depth 199816
> grow-32: malloc failed: Cannot allocate memory
>
> This demonstrates that it is possible for a userspace program that is
> just using basic libc functionality, and whose only bug is unbounded
> recursion, to corrupt memory despite being built with -fstack-check.
>
> As you said, to just demonstrate the core issue in a more contrived
> way, you can also use a simpler example:
>
> $ cat basic-grow-repro.c
> #include <err.h>
> #include <stdlib.h>
> #include <sys/mman.h>
>
> #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp,
> %0":"=r"(__stack)); __stack; })
>
> int main(void) {
> char *ptr = (char*)( (unsigned long)(STACK_POINTER() -
> (1024*1024*4)/*4MiB*/) & ~0xfffUL );
> if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr)
> err(1, "mmap");
> *(volatile char *)(ptr + 0x1000); /* expand stack */
> if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC))
> err(1, "mprotect");
> system("cat /proc/$PPID/maps");
> }
> $ gcc -o basic-grow-repro basic-grow-repro.c
> $ ./basic-grow-repro
> 5600a0fef000-5600a0ff0000 r--p 00000000 fd:01 28313737
> [...]/basic-grow-repro
> 5600a0ff0000-5600a0ff1000 r-xp 00001000 fd:01 28313737
> [...]/basic-grow-repro
> 5600a0ff1000-5600a0ff2000 r--p 00002000 fd:01 28313737
> [...]/basic-grow-repro
> 5600a0ff2000-5600a0ff3000 r--p 00002000 fd:01 28313737
> [...]/basic-grow-repro
> 5600a0ff3000-5600a0ff4000 rw-p 00003000 fd:01 28313737
> [...]/basic-grow-repro
> 7f9a88553000-7f9a88556000 rw-p 00000000 00:00 0
> 7f9a88556000-7f9a8857c000 r--p 00000000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f9a8857c000-7f9a886d2000 r-xp 00026000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f9a886d2000-7f9a88727000 r--p 0017c000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f9a88727000-7f9a8872b000 r--p 001d0000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f9a8872b000-7f9a8872d000 rw-p 001d4000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f9a8872d000-7f9a8873a000 rw-p 00000000 00:00 0
> 7f9a88754000-7f9a88756000 rw-p 00000000 00:00 0
> 7f9a88756000-7f9a8875a000 r--p 00000000 00:00 0 [vvar]
> 7f9a8875a000-7f9a8875c000 r-xp 00000000 00:00 0 [vdso]
> 7f9a8875c000-7f9a8875d000 r--p 00000000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f9a8875d000-7f9a88782000 r-xp 00001000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f9a88782000-7f9a8878c000 r--p 00026000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f9a8878c000-7f9a8878e000 r--p 00030000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f9a8878e000-7f9a88790000 rw-p 00032000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7fff84664000-7fff84665000 rwxp 00000000 00:00 0
> 7fff84665000-7fff84a67000 rw-p 00000000 00:00 0 [stack]
> $
>
>
> Though, while writing the above reproducer, I noticed another dodgy
> scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently
> ignores the stack guard region, because it only checks for VMA
> intersection, see this example:
Oh my.
>
> $ cat basic-grow-repro-ohno.c
> #include <err.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
>
> #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp,
> %0":"=r"(__stack)); __stack; })
>
> int main(void) {
> setbuf(stdout, NULL);
> char *ptr = (char*)( (unsigned long)(STACK_POINTER() -
> (1024*1024*4)/*4MiB*/) & ~0xfffUL );
> *(volatile char *)(ptr + 0x1000); /* expand stack to just above ptr */
>
> printf("trying to map at %p\n", ptr);
> system("cat /proc/$PPID/maps;echo");
> if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_FIXED_NOREPLACE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr)
> err(1, "mmap");
> system("cat /proc/$PPID/maps");
> }
> $ gcc -o basic-grow-repro-ohno basic-grow-repro-ohno.c
> $ ./basic-grow-repro-ohno
> trying to map at 0x7ffc344ca000
> 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0
> 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0
> 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0
> 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar]
> 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso]
> 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack]
>
> 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842
> [...]/basic-grow-repro-ohno
> 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0
> 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714
> /usr/lib/x86_64-linux-gnu/libc.so.6
> 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0
> 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0
> 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar]
> 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso]
> 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> 7ffc344ca000-7ffc344cb000 rwxp 00000000 00:00 0
> 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack]
> $
>
> That could also be bad: MAP_FIXED_NOREPLACE exists, from what I
> understand, partly so that malloc implementations can use it to grow
> heap memory chunks (though glibc doesn't use it, I'm not sure who
> actually uses it that way). We wouldn't want a malloc implementation
> to grow a heap memory chunk until it is directly adjacent to a stack.
It seems... weird to use it that way as you couldn't be sure you weren't
overwriting another VMA.
>
> > > I don't know of any specific scenario where this is actually exploitable,
> > > but it seems like it could be a security problem for sufficiently unlucky
> > > userspace.
> > >
> > > I believe we should ensure that, as long as code is compiled with something
> > > like -fstack-check, a stack overflow in it can never cause the main stack
> > > to overflow into adjacent heap memory.
> > >
> > > My fix effectively reverts the behavior for !vma_is_accessible() VMAs to
> > > the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap,
> > > between vmas"), so I think it should be a fairly safe change even in
> > > case A.
> > >
> > > Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack")
> > > Cc: stable@...r.kernel.org
> > > Signed-off-by: Jann Horn <jannh@...gle.com>
> > > ---
> > > I have tested that Libreoffice still launches after this change, though
> > > I don't know if that means anything.
> > >
> > > Note that I haven't tested the growsup code.
> > > ---
> > > mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++-------
> > > 1 file changed, 46 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index dd4b35a25aeb..971bfd6c1cea 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> > > gap_addr = TASK_SIZE;
> > >
> > > next = find_vma_intersection(mm, vma->vm_end, gap_addr);
> > > - if (next && vma_is_accessible(next)) {
> > > - if (!(next->vm_flags & VM_GROWSUP))
> > > + if (next && !(next->vm_flags & VM_GROWSUP)) {
> > > + /* see comments in expand_downwards() */
> > > + if (vma_is_accessible(prev))
> > > + return -ENOMEM;
> > > + if (address == next->vm_start)
> > > return -ENOMEM;
> > > - /* Check that both stack segments have the same anon_vma? */
> >
> > I hate that we even maintain this for one single arch I believe at this point?
>
> Looks that way, just parisc?
>
> It would be so nice if we could somehow just get rid of this concept
> of growing stacks in general...
>
> > > }
> > >
> > > if (next)
> > > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> > > /* Enforce stack_guard_gap */
> > > prev = vma_prev(&vmi);
> > > /* Check that both stack segments have the same anon_vma? */
> > > - if (prev) {
> > > - if (!(prev->vm_flags & VM_GROWSDOWN) &&
> > > - vma_is_accessible(prev) &&
> > > - (address - prev->vm_end < stack_guard_gap))
> > > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
> > > + (address - prev->vm_end < stack_guard_gap)) {
> > > + /*
> > > + * If the previous VMA is accessible, this is the normal case
> > > + * where the main stack is growing down towards some unrelated
> > > + * VMA. Enforce the full stack guard gap.
> > > + */
> > > + if (vma_is_accessible(prev))
> > > + return -ENOMEM;
> > > +
> > > + /*
> > > + * If the previous VMA is not accessible, we have a problem:
> > > + * We can't tell what userspace's intent is.
> > > + *
> > > + * Case A:
> > > + * Maybe userspace wants to use the previous VMA as a
> > > + * "guard region" at the bottom of the main stack, in which case
> > > + * userspace wants us to grow the stack until it is adjacent to
> > > + * the guard region. Apparently some Java runtime environments
> > > + * and Rust do that?
> > > + * That is kind of ugly, and in that case userspace really ought
> > > + * to ensure that the stack is fully expanded immediately, but
> > > + * we have to handle this case.
> >
> > Yeah we can't break userspace on this, no doubt somebody is relying on this
> > _somewhere_.
>
> It would have to be a new user who appeared after commit 1be7107fbe18.
> And they'd have to install a "guard vma" somewhere below the main
> stack, and they'd have to care so much about the size of the stack
> that a single page makes a difference.
You did say 'Apparently some Java runtime environments and Rust do that'
though right? Or am I misunderstanding?
>
> > That said, I wish we disallowed this altogether regardless of accessibility.
> >
> > > + *
> > > + * Case B:
> > > + * But maybe the previous VMA is entirely unrelated to the stack
> > > + * and is only *temporarily* PROT_NONE. For example, glibc
> > > + * malloc arenas create a big PROT_NONE region and then
> > > + * progressively mark parts of it as writable.
> > > + * In that case, we must not let the stack become adjacent to
> > > + * the previous VMA. Otherwise, after the region later becomes
> > > + * writable, a stack overflow will cause the stack to grow into
> > > + * the previous VMA, and we won't have any stack gap to protect
> > > + * against this.
> >
> > Should be careful with terminology here, an mprotect() will not allow a
> > merge so by 'grow into' you mean that a stack VMA could become immediately
> > adjacent to a non-stack VMA prior to it which was later made writable.
> >
> > Perhaps I am being pedantic...
>
> Ah, sorry, I worded that very confusingly. By "a stack overflow will
> cause the stack to grow into the previous VMA", I meant that the stack
> pointer moves into the previous VMA and the program uses part of the
> previous VMA as stack memory, clobbering whatever was stored there
> before. That part was not meant to talk about a change of VMA bounds.
Sure, yes this is what I assumed. Might be worth clarifying that just to be
_totally_ clear.
>
> > > + *
> > > + * As an ugly tradeoff, enforce a single-page gap.
> > > + * A single page will hopefully be small enough to not be
> > > + * noticed in case A, while providing the same level of
> > > + * protection in case B that normal userspace threads get.
> > > + */
> > > + if (address == prev->vm_end)
> > > return -ENOMEM;
> >
> > Ugh, yuck. Not a fan of this at all. Feels like a dreadful hack.
>
> Oh, I agree, I just didn't see a better way to do it.
>
> > You do raise an important point here, but it strikes me that we should be
> > doing this check in the mprotect()/mmap() MAP_FIXED scenarios where it
> > shouldn't be too costly to check against the next VMA (which we will be
> > obtaining anyway for merge checks)?
> >
> > That way we don't need a hack like this, and can just disallow the
> > operation. That'd probably be as liable to break the program as an -ENOMEM
> > on a stack expansion would...
>
> Hmm... yeah, I guess that would work. If someone hits this case, it
> would probably be less obvious to the programmer what went wrong based
> on the error code, but on the other hand, it would give userspace a
> slightly better chance to recover from the issue...
>
> I guess I can see if I can code that up.
Thanks!
Powered by blists - more mailing lists