linux-kernel - Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <012f05c9-a1f3-45cf-b584-897a03cca867@lucifer.local>
Date: Wed, 21 Aug 2024 18:00:54 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jeff Xu <jeffxu@...omium.org>
Cc: Pedro Falcato <pedro.falcato@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Shuah Khan <shuah@...nel.org>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, oliver.sang@...el.com,
        torvalds@...ux-foundation.org, Michael Ellerman <mpe@...erman.id.au>,
        Kees Cook <kees@...nel.org>
Subject: Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with
 can_modify_vma

On Wed, Aug 21, 2024 at 09:15:52AM GMT, Jeff Xu wrote:
> On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@...il.com> wrote:
> >
> > We were doing an extra mmap tree traversal just to check if the entire
> > range is modifiable. This can be done when we iterate through the VMAs
> > instead.
> >
> > Signed-off-by: Pedro Falcato <pedro.falcato@...il.com>
> > ---
> >  mm/mmap.c | 11 +----------
> >  mm/vma.c  | 19 ++++++++++++-------
> >  2 files changed, 13 insertions(+), 17 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 3af256bacef3..30ae4cb5cec9 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >                 unsigned long start, unsigned long end, struct list_head *uf,
> >                 bool unlock)
> >  {
> > -       struct mm_struct *mm = vma->vm_mm;
> > -
> > -       /*
> > -        * Check if memory is sealed, prevent unmapping a sealed VMA.
> > -        * can_modify_mm assumes we have acquired the lock on MM.
> > -        */
> > -       if (unlikely(!can_modify_mm(mm, start, end)))
> > -               return -EPERM;
> Another approach to improve perf  is to clone the vmi (since it
> already point to the first vma), and pass the cloned vmi/vma into
> can_modify_mm check, that will remove the cost of re-finding the first
> VMA.
>
> The can_modify_mm then continues from cloned VMI/vma till the end of
> address range, there will be some perf cost there.  However,  most
> address ranges in the real world are within a single VMA,  in
> practice, the perf cost is the same as checking the single VMA, 99.9%
> case.
>
> This will help preserve the nice sealing feature (if one of the vma is
> sealed, the entire address range is not modified)

This is tantamount to saying 'why not drop the entire series and try a
totally different approach?' and is frankly a little rude and
unprofessional to put as a comment midway through a series like this.

I don't agree this sealed property is 'nice', every single other form of
failure/inability to perform operations on a VMA is permitted to result in
partial operations and an error code.

If a VMA is sealed and causes an operation to fail, that's fine. We run on
arguments and facts in the kernel, not feelings.

Please provide this kind of generalised critique at the 0/7 patch. And try
to be vaguely civil.

>
> > -
> > -       return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > +       return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> >  }
> >
> >  /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 84965f2cd580..5850f7c0949b 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >                 if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> >                         goto map_count_exceeded;
> >
> > +               /* Don't bother splitting the VMA if we can't unmap it anyway */
> > +               if (!can_modify_vma(vma)) {
> > +                       error = -EPERM;
> > +                       goto start_split_failed;
> > +               }
> > +
> >                 error = __split_vma(vmi, vma, start, 1);
> >                 if (error)
> >                         goto start_split_failed;
> > @@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >          */
> >         next = vma;
> >         do {
> > +               if (!can_modify_vma(next)) {
> > +                       error = -EPERM;
> > +                       goto modify_vma_failed;
> > +               }
> > +
> >                 /* Does it split the end? */
> >                 if (next->vm_end > end) {
> >                         error = __split_vma(vmi, next, end, 0);
> > @@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >         __mt_destroy(&mt_detach);
> >         return 0;
> >
> > +modify_vma_failed:
> >  clear_tree_failed:
> >  userfaultfd_error:
> >  munmap_gather_failed:
> > @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> >         if (end == start)
> >                 return -EINVAL;
> >
> > -       /*
> > -        * Check if memory is sealed, prevent unmapping a sealed VMA.
> > -        * can_modify_mm assumes we have acquired the lock on MM.
> > -        */
> > -       if (unlikely(!can_modify_mm(mm, start, end)))
> > -               return -EPERM;
> > -
> >         /* Find the first overlapping VMA */
> >         vma = vma_find(vmi, end);
> >         if (!vma) {
> >
> > --
> > 2.46.0
> >