[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87e019fe-e137-49c9-a414-8f4f373fc4b6@gmx.de>
Date: Thu, 12 Sep 2024 19:37:51 +0200
From: Helge Deller <deller@....de>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Yang Shi <shy828301@...il.com>, Helge Deller <deller@...nel.org>,
linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-mm@...ck.org, linux-parisc@...r.kernel.org
Subject: Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable
stack
On 9/12/24 17:43, Liam R. Howlett wrote:
> * Helge Deller <deller@....de> [240911 22:10]:
>> On 9/12/24 03:32, Liam R. Howlett wrote:
>>> * Helge Deller <deller@....de> [240911 20:51]:
>>>> On 9/12/24 01:05, Liam R. Howlett wrote:
>>>>> * Yang Shi <shy828301@...il.com> [240911 18:16]:
>>>>>> On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
>>>>>> <Liam.Howlett@...cle.com> wrote:
>>>>>>>
>>>>>>> * Helge Deller <deller@...nel.org> [240911 15:20]:
>>>>>>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be
>>>>>>>> sufficient to map memory for usage as stack on all architectures.
>>>>>>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
>>>>>>>> has to be used.
>>>>>>>> To clarify, here is the relevant info from the mmap() man page:
>>>>>>>>
>>>>>>>> MAP_GROWSDOWN
>>>>>>>> This flag is used for stacks. It indicates to the kernel virtual
>>>>>>>> memory system that the mapping should extend downward in memory. The
>>>>>>>> return address is one page lower than the memory area that is
>>>>>>>> actually created in the process's virtual address space. Touching an
>>>>>>>> address in the "guard" page below the mapping will cause the mapping
>>>>>>>> to grow by a page. This growth can be repeated until the mapping
>>>>>>>> grows to within a page of the high end of the next lower mapping,
>>>>>>>> at which point touching the "guard" page will result in a SIGSEGV
>>>>>>>> signal.
>>>>>>>>
>>>>>>>> MAP_STACK (since Linux 2.6.27)
>>>>>>>> Allocate the mapping at an address suitable for a process or thread
>>>>>>>> stack.
>>>>>>>>
>>>>>>>> This flag is currently a no-op on Linux. However, by employing this
>>>>>>>> flag, applications can ensure that they transparently obtain support
>>>>>>>> if the flag is implemented in the future. Thus, it is used in the
>>>>>>>> glibc threading implementation to allow for the fact that
>>>>>>>> some architectures may (later) require special treatment for
>>>>>>>> stack allocations. A further reason to employ this flag is
>>>>>>>> portability: MAP_STACK exists (and has an effect) on some
>>>>>>>> other systems (e.g., some of the BSDs).
>>>>>>>>
>>>>>>>> The reason to suggest this change is, that on the parisc architecture the
>>>>>>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
>>>>>>>> work. Note that there exists no MAP_GROWSUP flag.
>>>>>>>> By changing the behaviour of MAP_STACK to mark the memory area with the
>>>>>>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
>>>>>>>> architecture) the MAP_STACK flag does exactly what people would expect on
>>>>>>>> all platforms.
>>>>>>>>
>>>>>>>> This change should have no negative side-effect, as all code which
>>>>>>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
>>>>>>>>
>>>>>>>> Signed-off-by: Helge Deller <deller@....de>
>>>>>>>>
>>>>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>>>>> index bcb201ab7a41..66bc72a0cb19 100644
>>>>>>>> --- a/include/linux/mman.h
>>>>>>>> +++ b/include/linux/mman.h
>>>>>>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>>>> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
>>>>>>>> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
>>>>>>>> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
>>>>>>>> + _calc_vm_trans(flags, MAP_STACK, VM_STACK ) |
>>>>>>>
>>>>>>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
>>>>>>> change the user interface to create a vma that will grow. I'm not
>>>>>>> entirely sure this is okay?
>>>>>>
>>>>>> AFAICT, I don't see this is a problem. Currently huge page also skips
>>>>>> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
>>>>>> __thp_vma_allowable_orders() returns 0 if the vma is a temporary
>>>>>> stack.
>>>>>
>>>>> If someone is using MAP_STACK to avoid having a huge page, they will
>>>>> also get a mapping that grows - which is different than what happens
>>>>> today.
>>>>>
>>>>> I'm not saying that's right, but someone could be abusing the existing
>>>>> flag and this will change the behaviour.
>>>>
>>>> Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
>>>> Why abusing MAP_STACK for that?
>>>
>>> I can think of two answers:
>>> 1. An error that has worked without issues so far
>>> 2. One less system call
>>>
>>> I'm not saying this really is a blocker, but the change is not without
>>> risk as it does change behaviour the user could see.
>>>
>>> Interestingly enough, the man page is incorrect as it is written because
>>> the flag is not strictly a no-op; it ensures no huge pages. So the
>>> feature of applying VM_NOHUGEPAGE with the use of MAP_STACK is not
>>> documented today.
>>
>> Yes.
>>
>>> What happens to call that use the mmap(MAP_GROWSDOWN | MAP_STACK) on
>>> parisc today?
>>
>> Today, without my patch, on parisc the area is then flagged VM_GROWSDOWN only.
>> In that case, stack expansion will not work, as it checks for VM_GROWSUP.
>>
>>> How does this change with your VM_STACK change? Wouldn't this result
>>> in failed mappings?
>>> VM_GROWSDOWN | VM_GROWSUP would fail in do_mmap(), and these would be> set if you map MAP_STACK to VM_STACK which is defined as VM_GROWSUP?
>>
>> Right, with my change, the area will be tagged VM_GROWSUP and VM_GROWSDOWN.
>> Due to VM_GROWSUP stack expansion works.
>> The mapping doesn't fail in do_mmap(), because stacks are not file-mapped
>> or shared or droppable. They should be mapped with MAP_PRIVATE, right?
>
> Oh my, yes. So now you will have a stack that can expand in either
> direction, but it's okay because one direction isn't checked. I sure
> hope the rest of the code is correctly #ifdef'ed for this.
Only one direction will be handled (depending on the architecture),
even if both (VM_GROWSUP and VM_GROWSDOWN) are set.
So it shouldn't be a problem to have both directions set.
>> Another option is to introduce an alias, e.g.:
>> #define MAP_STACK_EXPANDABLE MAP_GROWSDOWN
>> and then
>
> I don't like either of these options.
>
> I guess you could also detect the MAP_STACK and MAP_GROWSDOWN and change
> the flags for parisc, which I also don't like, but since parisc is the
> only arch using this it's hard to justify a change that may cause issues
> in other archs.
Well, I still don't actually see real issues with my proposal.
That's why I proposed to change MAP_STACK generally.
Just adapting it for parisc was my initial approach which I sent
to the parisc mailing list prior to this patch:
https://patchwork.kernel.org/project/linux-parisc/patch/Zt3yJUiczUBmEC3a@p100/
I'd like to wait for some possible further feedback.
In the end I might end up just changing it for parisc.
> A quick grep shows we set VM_STACK_DEFAULT_FLAGS in x86 and powerpc,
> which could affect what happens here with your change. I am concerned
> about the bleeding of other flags through this change.
Shouldn't be a problem. The VM_STACK in VM_STACK_FLAGS is already VM_GROWSUP
or VM_GROWSDOWN.
include/linux/mm.h:#define VM_STACK_FLAGS (VM_STACK | VM_STACK_DEFAULT_FLAGS | VM_ACCOUNT)
Helge
Powered by blists - more mailing lists