[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95c4efe9-e92a-46fe-bf41-9141e125332d@gmx.de>
Date: Thu, 12 Sep 2024 02:50:55 +0200
From: Helge Deller <deller@....de>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Yang Shi <shy828301@...il.com>, Helge Deller <deller@...nel.org>,
linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-mm@...ck.org, linux-parisc@...r.kernel.org
Subject: Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable
stack
On 9/12/24 01:05, Liam R. Howlett wrote:
> * Yang Shi <shy828301@...il.com> [240911 18:16]:
>> On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
>> <Liam.Howlett@...cle.com> wrote:
>>>
>>> * Helge Deller <deller@...nel.org> [240911 15:20]:
>>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be
>>>> sufficient to map memory for usage as stack on all architectures.
>>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
>>>> has to be used.
>>>> To clarify, here is the relevant info from the mmap() man page:
>>>>
>>>> MAP_GROWSDOWN
>>>> This flag is used for stacks. It indicates to the kernel virtual
>>>> memory system that the mapping should extend downward in memory. The
>>>> return address is one page lower than the memory area that is
>>>> actually created in the process's virtual address space. Touching an
>>>> address in the "guard" page below the mapping will cause the mapping
>>>> to grow by a page. This growth can be repeated until the mapping
>>>> grows to within a page of the high end of the next lower mapping,
>>>> at which point touching the "guard" page will result in a SIGSEGV
>>>> signal.
>>>>
>>>> MAP_STACK (since Linux 2.6.27)
>>>> Allocate the mapping at an address suitable for a process or thread
>>>> stack.
>>>>
>>>> This flag is currently a no-op on Linux. However, by employing this
>>>> flag, applications can ensure that they transparently obtain support
>>>> if the flag is implemented in the future. Thus, it is used in the
>>>> glibc threading implementation to allow for the fact that
>>>> some architectures may (later) require special treatment for
>>>> stack allocations. A further reason to employ this flag is
>>>> portability: MAP_STACK exists (and has an effect) on some
>>>> other systems (e.g., some of the BSDs).
>>>>
>>>> The reason to suggest this change is, that on the parisc architecture the
>>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
>>>> work. Note that there exists no MAP_GROWSUP flag.
>>>> By changing the behaviour of MAP_STACK to mark the memory area with the
>>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
>>>> architecture) the MAP_STACK flag does exactly what people would expect on
>>>> all platforms.
>>>>
>>>> This change should have no negative side-effect, as all code which
>>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
>>>>
>>>> Signed-off-by: Helge Deller <deller@....de>
>>>>
>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>> index bcb201ab7a41..66bc72a0cb19 100644
>>>> --- a/include/linux/mman.h
>>>> +++ b/include/linux/mman.h
>>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>>>> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
>>>> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
>>>> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
>>>> + _calc_vm_trans(flags, MAP_STACK, VM_STACK ) |
>>>
>>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
>>> change the user interface to create a vma that will grow. I'm not
>>> entirely sure this is okay?
>>
>> AFAICT, I don't see this is a problem. Currently huge page also skips
>> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
>> __thp_vma_allowable_orders() returns 0 if the vma is a temporary
>> stack.
>
> If someone is using MAP_STACK to avoid having a huge page, they will
> also get a mapping that grows - which is different than what happens
> today.
>
> I'm not saying that's right, but someone could be abusing the existing
> flag and this will change the behaviour.
Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
Why abusing MAP_STACK for that?
Helge
>>> That is mmap(MAP_STACK) would set VM_NOHUGEPAGE right now, with this
>>> change you'd get VM_NOHUGEPAGE | VM_GROWS<something>
>>>
>>>> _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
>>>> arch_calc_vm_flag_bits(flags);
>>>> }
Powered by blists - more mailing lists