lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <179bfa57-76d0-4c2e-ab2c-4f9856cf488a@gmx.de>
Date: Sun, 15 Sep 2024 22:04:35 +0200
From: Helge Deller <deller@....de>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Yang Shi <shy828301@...il.com>, Helge Deller <deller@...nel.org>,
 linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
 linux-mm@...ck.org, linux-parisc@...r.kernel.org
Subject: Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable
 stack

On 9/12/24 19:37, Helge Deller wrote:
> On 9/12/24 17:43, Liam R. Howlett wrote:
>> * Helge Deller <deller@....de> [240911 22:10]:
>>> On 9/12/24 03:32, Liam R. Howlett wrote:
>>>> * Helge Deller <deller@....de> [240911 20:51]:
>>>>> On 9/12/24 01:05, Liam R. Howlett wrote:
>>>>>> * Yang Shi <shy828301@...il.com> [240911 18:16]:
>>>>>>> On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
>>>>>>> <Liam.Howlett@...cle.com> wrote:
>>>>>>>>
>>>>>>>> * Helge Deller <deller@...nel.org> [240911 15:20]:
>>>>>>>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be
>>>>>>>>> sufficient to map memory for usage as stack on all architectures.
>>>>>>>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
>>>>>>>>> has to be used.
>>>>>>>>> To clarify, here is the relevant info from the mmap() man page:
>>>>>>>>>
>>>>>>>>> MAP_GROWSDOWN
>>>>>>>>>       This flag is used for stacks. It indicates to the kernel virtual
>>>>>>>>>       memory system that the mapping should extend downward in memory.  The
>>>>>>>>>       return address is one page lower than the memory area that is
>>>>>>>>>       actually created in the process's virtual address space.  Touching an
>>>>>>>>>       address in the "guard" page below the mapping will cause the mapping
>>>>>>>>>       to grow by a page. This growth can be repeated until the mapping
>>>>>>>>>       grows to within a page of the high end of the next lower mapping,
>>>>>>>>>       at which point touching the "guard" page will result in a SIGSEGV
>>>>>>>>>       signal.
>>>>>>>>>
>>>>>>>>> MAP_STACK (since Linux 2.6.27)
>>>>>>>>>       Allocate the mapping at an address suitable for a process or thread
>>>>>>>>>       stack.
>>>>>>>>>
>>>>>>>>>       This flag is currently a no-op on Linux. However, by employing this
>>>>>>>>>       flag, applications can ensure that they transparently obtain support
>>>>>>>>>       if the flag is implemented in the future. Thus, it is used in the
>>>>>>>>>       glibc threading implementation to allow for the fact that
>>>>>>>>>       some architectures may (later) require special treatment for
>>>>>>>>>       stack allocations. A further reason to employ this flag is
>>>>>>>>>       portability: MAP_STACK exists (and has an effect) on some
>>>>>>>>>       other systems (e.g., some of the BSDs).
>>>>>>>>>
>>>>>>>>> The reason to suggest this change is, that on the parisc architecture the
>>>>>>>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
>>>>>>>>> work. Note that there exists no MAP_GROWSUP flag.
>>>>>>>>> By changing the behaviour of MAP_STACK to mark the memory area with the
>>>>>>>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
>>>>>>>>> architecture) the MAP_STACK flag does exactly what people would expect on
>>>>>>>>> all platforms.
>>>>>>>>>
>>>>>>>>> This change should have no negative side-effect, as all code which
>>>>>>>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Helge Deller <deller@....de>
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>>>>>> index bcb201ab7a41..66bc72a0cb19 100644
>>>>>>>>> --- a/include/linux/mman.h
>>>>>>>>> +++ b/include/linux/mman.h
>>>>>>>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>>>>>          return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>>>>>>>>>                 _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>>>>>>>>>                 _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
>>>>>>>>> +            _calc_vm_trans(flags, MAP_STACK,      VM_STACK     ) |
>>>>>>>>
>>>>>>>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
>>>>>>>> change the user interface to create a vma that will grow.  I'm not
>>>>>>>> entirely sure this is okay?
>>>>>>>
>>>>>>> AFAICT, I don't see this is a problem. Currently huge page also skips
>>>>>>> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
>>>>>>> __thp_vma_allowable_orders() returns 0 if the vma is a temporary
>>>>>>> stack.
>>>>>>
>>>>>> If someone is using MAP_STACK to avoid having a huge page, they will
>>>>>> also get a mapping that grows - which is different than what happens
>>>>>> today.
>>>>>>
>>>>>> I'm not saying that's right, but someone could be abusing the existing
>>>>>> flag and this will change the behaviour.
>>>>>
>>>>> Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
>>>>> Why abusing MAP_STACK for that?
>>>>
>>>> I can think of two answers:
>>>> 1. An error that has worked without issues so far
>>>> 2. One less system call
>>>>
>>>> I'm not saying this really is a blocker, but the change is not without
>>>> risk as it does change behaviour the user could see.
>>>>
>>>> Interestingly enough, the man page is incorrect as it is written because
>>>> the flag is not strictly a no-op; it ensures no huge pages.  So the
>>>> feature of applying VM_NOHUGEPAGE with the use of MAP_STACK is not
>>>> documented today.
>>>
>>> Yes.
>>>
>>>> What happens to call that use the mmap(MAP_GROWSDOWN | MAP_STACK) on
>>>> parisc today?
>>>
>>> Today, without my patch, on parisc the area is then flagged VM_GROWSDOWN only.
>>> In that case, stack expansion will not work, as it checks for VM_GROWSUP.
>>>
>>>> How does this change with your VM_STACK change?  Wouldn't this result
>>>> in failed mappings?
>>>> VM_GROWSDOWN | VM_GROWSUP would fail in do_mmap(), and these would be> set if you map MAP_STACK to VM_STACK which is defined as VM_GROWSUP?
>>>
>>> Right, with my change, the area will be tagged VM_GROWSUP and VM_GROWSDOWN.
>>> Due to VM_GROWSUP stack expansion works.
>>> The mapping doesn't fail in do_mmap(), because stacks are not file-mapped
>>> or shared or droppable. They should be mapped with MAP_PRIVATE, right?
>>
>> Oh my, yes.  So now you will have a stack that can expand in either
>> direction, but it's okay because one direction isn't checked.  I sure
>> hope the rest of the code is correctly #ifdef'ed for this.
>
> Only one direction will be handled (depending on the architecture),
> even if both (VM_GROWSUP and VM_GROWSDOWN) are set.
> So it shouldn't be a problem to have both directions set.
>
>>> Another option is to introduce an alias, e.g.:
>>> #define MAP_STACK_EXPANDABLE    MAP_GROWSDOWN
>>> and then
>>
>> I don't like either of these options.
>>
>> I guess you could also detect the MAP_STACK and MAP_GROWSDOWN and change
>> the flags for parisc, which I also don't like, but since parisc is the
>> only arch using this it's hard to justify a change that may cause issues
>> in other archs.
>
> Well, I still don't actually see real issues with my proposal.
> That's why I proposed to change MAP_STACK generally.
> Just adapting it for parisc was my initial approach which I sent
> to the parisc mailing list prior to this patch:
> https://patchwork.kernel.org/project/linux-parisc/patch/Zt3yJUiczUBmEC3a@p100/
>
> I'd like to wait for some possible further feedback.
> In the end I might end up just changing it for parisc.

No further input, so I will apply a patch which changes the default behavior
on parisc only. Other architectures will stay unchanged.

Thanks!
Helge

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ