lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <75ihgmlcmou7yatoeva5sezbf6stow4gtdyurwzj5fxghjq7yw@o3u5wudjdnkk>
Date: Wed, 11 Sep 2024 21:42:42 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Yang Shi <shy828301@...il.com>
Cc: Helge Deller <deller@....de>, Helge Deller <deller@...nel.org>,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-parisc@...r.kernel.org
Subject: Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable
 stack

* Yang Shi <shy828301@...il.com> [240911 21:08]:
> On Wed, Sep 11, 2024 at 5:50 PM Helge Deller <deller@....de> wrote:
> >
> > On 9/12/24 01:05, Liam R. Howlett wrote:
> > > * Yang Shi <shy828301@...il.com> [240911 18:16]:
> > >> On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
> > >> <Liam.Howlett@...cle.com> wrote:
> > >>>
> > >>> * Helge Deller <deller@...nel.org> [240911 15:20]:
> > >>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be
> > >>>> sufficient to map memory for usage as stack on all architectures.
> > >>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
> > >>>> has to be used.
> > >>>> To clarify, here is the relevant info from the mmap() man page:
> > >>>>
> > >>>> MAP_GROWSDOWN
> > >>>>     This flag is used for stacks. It indicates to the kernel virtual
> > >>>>     memory system that the mapping should extend downward in memory.  The
> > >>>>     return address is one page lower than the memory area that is
> > >>>>     actually created in the process's virtual address space.  Touching an
> > >>>>     address in the "guard" page below the mapping will cause the mapping
> > >>>>     to grow by a page. This growth can be repeated until the mapping
> > >>>>     grows to within a page of the high end of the next lower mapping,
> > >>>>     at which point touching the "guard" page will result in a SIGSEGV
> > >>>>     signal.
> > >>>>
> > >>>> MAP_STACK (since Linux 2.6.27)
> > >>>>     Allocate the mapping at an address suitable for a process or thread
> > >>>>     stack.
> > >>>>
> > >>>>     This flag is currently a no-op on Linux. However, by employing this
> > >>>>     flag, applications can ensure that they transparently obtain support
> > >>>>     if the flag is implemented in the future. Thus, it is used in the
> > >>>>     glibc threading implementation to allow for the fact that
> > >>>>     some architectures may (later) require special treatment for
> > >>>>     stack allocations. A further reason to employ this flag is
> > >>>>     portability: MAP_STACK exists (and has an effect) on some
> > >>>>     other systems (e.g., some of the BSDs).
> > >>>>
> > >>>> The reason to suggest this change is, that on the parisc architecture the
> > >>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
> > >>>> work. Note that there exists no MAP_GROWSUP flag.
> > >>>> By changing the behaviour of MAP_STACK to mark the memory area with the
> > >>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
> > >>>> architecture) the MAP_STACK flag does exactly what people would expect on
> > >>>> all platforms.
> > >>>>
> > >>>> This change should have no negative side-effect, as all code which
> > >>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
> > >>>>
> > >>>> Signed-off-by: Helge Deller <deller@....de>
> > >>>>
> > >>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
> > >>>> index bcb201ab7a41..66bc72a0cb19 100644
> > >>>> --- a/include/linux/mman.h
> > >>>> +++ b/include/linux/mman.h
> > >>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > >>>>        return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> > >>>>               _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> > >>>>               _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > >>>> +            _calc_vm_trans(flags, MAP_STACK,      VM_STACK     ) |
> > >>>
> > >>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
> > >>> change the user interface to create a vma that will grow.  I'm not
> > >>> entirely sure this is okay?
> > >>
> > >> AFAICT, I don't see this is a problem. Currently huge page also skips
> > >> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
> > >> __thp_vma_allowable_orders() returns 0 if the vma is a temporary
> > >> stack.
> > >
> > > If someone is using MAP_STACK to avoid having a huge page, they will
> > > also get a mapping that grows - which is different than what happens
> > > today.
> > >
> > > I'm not saying that's right, but someone could be abusing the existing
> > > flag and this will change the behaviour.
> >
> > Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
> > Why abusing MAP_STACK for that?
> 
> Different sources and reports showed having huge pages for stack
> mapping hurts performance. A lot of applications, for example, pthread
> lib, allocate stack with MAP_STACK and they don't call MADV_NOHUGEPAGE
> on stack mapping.
> 

It makes sense to have a stack with NOHUGEPAGE, but does anyone use
MAP_STACK to avoid the extra syscall to madv to set it on mappings that
are NOT stacks which would now become stack-like with this change?

...
> > >>>>               _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> > >>>>               arch_calc_vm_flag_bits(flags);
> > >>>>   }
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ