linux-kernel - Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c226a38a-3612-4949-9da7-8d42f8636b47@gmx.de>
Date: Thu, 12 Sep 2024 04:09:53 +0200
From: Helge Deller <deller@....de>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Yang Shi <shy828301@...il.com>, Helge Deller <deller@...nel.org>,
 linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
 linux-mm@...ck.org, linux-parisc@...r.kernel.org
Subject: Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable
 stack

On 9/12/24 03:32, Liam R. Howlett wrote:
> * Helge Deller <deller@....de> [240911 20:51]:
>> On 9/12/24 01:05, Liam R. Howlett wrote:
>>> * Yang Shi <shy828301@...il.com> [240911 18:16]:
>>>> On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
>>>> <Liam.Howlett@...cle.com> wrote:
>>>>>
>>>>> * Helge Deller <deller@...nel.org> [240911 15:20]:
>>>>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be
>>>>>> sufficient to map memory for usage as stack on all architectures.
>>>>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
>>>>>> has to be used.
>>>>>> To clarify, here is the relevant info from the mmap() man page:
>>>>>>
>>>>>> MAP_GROWSDOWN
>>>>>>      This flag is used for stacks. It indicates to the kernel virtual
>>>>>>      memory system that the mapping should extend downward in memory.  The
>>>>>>      return address is one page lower than the memory area that is
>>>>>>      actually created in the process's virtual address space.  Touching an
>>>>>>      address in the "guard" page below the mapping will cause the mapping
>>>>>>      to grow by a page. This growth can be repeated until the mapping
>>>>>>      grows to within a page of the high end of the next lower mapping,
>>>>>>      at which point touching the "guard" page will result in a SIGSEGV
>>>>>>      signal.
>>>>>>
>>>>>> MAP_STACK (since Linux 2.6.27)
>>>>>>      Allocate the mapping at an address suitable for a process or thread
>>>>>>      stack.
>>>>>>
>>>>>>      This flag is currently a no-op on Linux. However, by employing this
>>>>>>      flag, applications can ensure that they transparently obtain support
>>>>>>      if the flag is implemented in the future. Thus, it is used in the
>>>>>>      glibc threading implementation to allow for the fact that
>>>>>>      some architectures may (later) require special treatment for
>>>>>>      stack allocations. A further reason to employ this flag is
>>>>>>      portability: MAP_STACK exists (and has an effect) on some
>>>>>>      other systems (e.g., some of the BSDs).
>>>>>>
>>>>>> The reason to suggest this change is, that on the parisc architecture the
>>>>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
>>>>>> work. Note that there exists no MAP_GROWSUP flag.
>>>>>> By changing the behaviour of MAP_STACK to mark the memory area with the
>>>>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
>>>>>> architecture) the MAP_STACK flag does exactly what people would expect on
>>>>>> all platforms.
>>>>>>
>>>>>> This change should have no negative side-effect, as all code which
>>>>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
>>>>>>
>>>>>> Signed-off-by: Helge Deller <deller@....de>
>>>>>>
>>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>>> index bcb201ab7a41..66bc72a0cb19 100644
>>>>>> --- a/include/linux/mman.h
>>>>>> +++ b/include/linux/mman.h
>>>>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>>         return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>>>>>>                _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>>>>>>                _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
>>>>>> +            _calc_vm_trans(flags, MAP_STACK,      VM_STACK     ) |
>>>>>
>>>>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
>>>>> change the user interface to create a vma that will grow.  I'm not
>>>>> entirely sure this is okay?
>>>>
>>>> AFAICT, I don't see this is a problem. Currently huge page also skips
>>>> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
>>>> __thp_vma_allowable_orders() returns 0 if the vma is a temporary
>>>> stack.
>>>
>>> If someone is using MAP_STACK to avoid having a huge page, they will
>>> also get a mapping that grows - which is different than what happens
>>> today.
>>>
>>> I'm not saying that's right, but someone could be abusing the existing
>>> flag and this will change the behaviour.
>>
>> Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
>> Why abusing MAP_STACK for that?
>
> I can think of two answers:
> 1. An error that has worked without issues so far
> 2. One less system call
>
> I'm not saying this really is a blocker, but the change is not without
> risk as it does change behaviour the user could see.
>
> Interestingly enough, the man page is incorrect as it is written because
> the flag is not strictly a no-op; it ensures no huge pages.  So the
> feature of applying VM_NOHUGEPAGE with the use of MAP_STACK is not
> documented today.

Yes.

> What happens to call that use the mmap(MAP_GROWSDOWN | MAP_STACK) on
> parisc today?

Today, without my patch, on parisc the area is then flagged VM_GROWSDOWN only.
In that case, stack expansion will not work, as it checks for VM_GROWSUP.

> How does this change with your VM_STACK change?  Wouldn't this result
> in failed mappings?
> VM_GROWSDOWN | VM_GROWSUP would fail in do_mmap(), and these would be> set if you map MAP_STACK to VM_STACK which is defined as VM_GROWSUP?

Right, with my change, the area will be tagged VM_GROWSUP and VM_GROWSDOWN.
Due to VM_GROWSUP stack expansion works.
The mapping doesn't fail in do_mmap(), because stacks are not file-mapped
or shared or droppable. They should be mapped with MAP_PRIVATE, right?


Another option is to introduce an alias, e.g.:
#define MAP_STACK_EXPANDABLE	MAP_GROWSDOWN
and then
diff --git a/include/linux/mman.h b/include/linux/mman.h
index bcb201ab7a41..6a7ec3e0078a 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -153,7 +153,7 @@ calc_vm_prot_bits(unsigned long prot, unsigned long pkey)
  static inline unsigned long
  calc_vm_flag_bits(unsigned long flags)
  {
-       return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
+       return _calc_vm_trans(flags, MAP_STACK_EXPANDABLE,  VM_STACK ) |
                _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
                _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
                _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |

This simply uses the existing MAP_GROWSDOWN flag to mark a stack,
is independend on the stack growth direction and is fully backwards
compatible to existing behaviour.

I prefer my initial proposal to change MAP_STACK though, as it's
simpler and clearer.

Helge