linux-kernel - Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <lya4vngdchp6g3cjohfxjpbz7ehnmdxhcwfz7qn7ge75e4c7em@somajfleln5a>
Date: Wed, 21 Aug 2024 14:25:43 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Jeff Xu <jeffxu@...omium.org>
Cc: Pedro Falcato <pedro.falcato@...il.com>, rientjes@...gle.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Shuah Khan <shuah@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        oliver.sang@...el.com, torvalds@...ux-foundation.org,
        Michael Ellerman <mpe@...erman.id.au>, Kees Cook <kees@...nel.org>
Subject: Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with
 can_modify_vma

* Jeff Xu <jeffxu@...omium.org> [240821 12:33]:
> On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato <pedro.falcato@...il.com> wrote:
> >
> > On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu <jeffxu@...omium.org> wrote:
> > >
> > > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@...il.com> wrote:
> > > >
> > > > We were doing an extra mmap tree traversal just to check if the entire
> > > > range is modifiable. This can be done when we iterate through the VMAs
> > > > instead.
> > > >
> > > > Signed-off-by: Pedro Falcato <pedro.falcato@...il.com>
> > > > ---
> > > >  mm/mmap.c | 11 +----------
> > > >  mm/vma.c  | 19 ++++++++++++-------
> > > >  2 files changed, 13 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 3af256bacef3..30ae4cb5cec9 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >                 unsigned long start, unsigned long end, struct list_head *uf,
> > > >                 bool unlock)
> > > >  {
> > > > -       struct mm_struct *mm = vma->vm_mm;
> > > > -
> > > > -       /*
> > > > -        * Check if memory is sealed, prevent unmapping a sealed VMA.
> > > > -        * can_modify_mm assumes we have acquired the lock on MM.
> > > > -        */
> > > > -       if (unlikely(!can_modify_mm(mm, start, end)))
> > > > -               return -EPERM;
> > > Another approach to improve perf  is to clone the vmi (since it
> > > already point to the first vma), and pass the cloned vmi/vma into
> > > can_modify_mm check, that will remove the cost of re-finding the first
> > > VMA.
> > >
> > > The can_modify_mm then continues from cloned VMI/vma till the end of
> > > address range, there will be some perf cost there.  However,  most
> > > address ranges in the real world are within a single VMA,  in
> > > practice, the perf cost is the same as checking the single VMA, 99.9%
> > > case.
> > >
> > > This will help preserve the nice sealing feature (if one of the vma is
> > > sealed, the entire address range is not modified)
> >
> > Please drop it. No one wants to preserve this. Everyone is in sync
> > when it comes to the solution except you.
> 
> Still, this is another option that will very likely address the perf issue.

The cost of cloning the vmi is a memory copy, while the cost of not
cloning the vmi is a re-walk of the tree.  Neither of these are free.

Both can be avoided by checking the vma flag during the existing loop,
which is what is done in this patch set.  This is obviously lower cost
of either of the above options since they do extra work and still have
to check the vma flag(s).

I think you are confusing the behaviour of the munmap() call when you
state 'partial success' with a potential split operation that may happen
prior to aborting a munmap() call.

What could happen in the failure scenario is that you'd end up with two
vmas instead of one mapping a particular area - but the mseal flag is
checked prior to allowing a split to happen, so it'll only split
non-mseal()'ed vmas.

So what mseal() used to avoid is a call that could potentially split a
vma that isn't mseal()'ed, while this patch will allow it to be split.
This is how it has been for a very long time and it's okay.

Considering how this affects the security model of mseal(), it means the
attacker could still accomplish the same feat of splitting that first
vma by altering the call to munmap() to avoid an mseal()'ed vma, so
there isn't much lost or gained here security wise - if any.

Thanks,
Liam