[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874j8x5t4e.fsf@mail.lhotse>
Date: Wed, 10 Jul 2024 22:28:01 +1000
From: Michael Ellerman <mpe@...erman.id.au>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R. Howlett"
<Liam.Howlett@...cle.com>, linux-mm@...ck.org, Andrew Morton
<akpm@...ux-foundation.org>, Suren Baghdasaryan <surenb@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>, Lorenzo Stoakes <lstoakes@...il.com>,
Matthew Wilcox <willy@...radead.org>, sidhartha.kumar@...cle.com, "Paul E
. McKenney" <paulmck@...nel.org>, Bert Karwatzki <spasswolf@....de>, Jiri
Olsa <olsajiri@...il.com>, linux-kernel@...r.kernel.org, Kees Cook
<kees@...nel.org>, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v3 16/16] mm/mmap: Move may_expand_vm() check in
mmap_region()
Lorenzo Stoakes <lorenzo.stoakes@...cle.com> writes:
> On Mon, Jul 08, 2024 at 04:43:15PM GMT, Liam R. Howlett wrote:
>>
...
>> The functionality here has changed
>> --- from ---
>> may_expand_vm() check
>> can_modify_mm() check
>> arch_unmap()
>> vms_gather_munmap_vmas()
>> ...
>>
>> --- to ---
>> can_modify_mm() check
>> arch_unmap()
>> vms_gather_munmap_vmas()
>> may_expand_vm() check
>> ...
>>
>> vms_gather_munmap_vmas() does nothing but figures out what to do later,
>> but could use memory and can fail.
>>
>> The user implications are:
>>
>> 1. The return type on the error may change to -EPERM from -ENOMEM, if
>> you are not allowed to expand and are trying to overwrite mseal()'ed
>> VMAs. That seems so very rare that I'm not sure it's worth mentioning.
>>
>>
>> 2. arch_unmap() called prior to may_expand_vm().
>> powerpc uses this to set mm->context.vdso = NULL if mm->context.vdso is
>> within the unmap range. User implication of this means that an
>> application my set the vdso to NULL prior to hitting the -ENOMEM case in
>> may_expand_vm() due to the address space limit.
>>
>> Assuming the removal of the vdso does not cause the application to seg
>> fault, then the user visible change is that any vdso call after a failed
>> mmap(MAP_FIXED) call would result in a seg fault. The only reason it
>> would fail is if the mapping process was attempting to map a large
>> enough area over the vdso (which is accounted and in the vma tree,
>> afaict) and ran out of memory. Note that this situation could arise
>> already since we could run out of memory (not accounting) after the
>> arch_unmap() call within the kernel.
>>
>> The code today can suffer the same fate, but not by the accounting
>> failure. It can happen due to failure to allocate a new vma,
>> do_vmi_munmap() failure after the arch_unmap() call, or any of the other
>> failure scenarios later in the mmap_region() function.
>>
>> At the very least, this requires an expanded change log.
>
> Indeed, also (as mentioned on IRC) I feel like we need to look at whether
> we _truly_ need this arch_unmap() call for a single, rather antiquated,
> architecture.
You can call it "niche" or "irrelevant" or "fringe", but "antiquated" is
factually wrong :) Power10 came out of the fab just a few years ago at
7nm.
> I mean why are they unmapping the VDSO, why is that valid, why does it need
> that field to be set to NULL, is it possible to signify that in some other
> way etc.?
It was originally for CRIU. So a niche workload on a niche architecture.
But from the commit that added it, it sounds like CRIU was using mremap,
which should be handled these days by vdso_mremap(). So it could be that
arch_unmap() is not actually needed for CRIU anymore.
Then I guess we have to decide if removing our arch_unmap() would be an
ABI break, regardless of whether CRIU needs it or not.
cheers
Powered by blists - more mailing lists