linux-kernel - Re: [PATCH v2 2/6] mm/munmap: Replace can_modify_mm with can_modify

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALmYWFu5CXiJKict9EqtEbaBNB80rU=p3izWE3Jgd6OWoY0Tig@mail.gmail.com>
Date: Mon, 12 Aug 2024 10:38:11 -0700
From: Jeff Xu <jeffxu@...gle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, Jeff Xu <jeffxu@...gle.com>, 
	Pedro Falcato <pedro.falcato@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Vlastimil Babka <vbabka@...e.cz>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, oliver.sang@...el.com, 
	torvalds@...ux-foundation.org, Michael Ellerman <mpe@...erman.id.au>, 
	Kees Cook <keescook@...omium.org>
Subject: Re: [PATCH v2 2/6] mm/munmap: Replace can_modify_mm with can_modify_vma

On Mon, Aug 12, 2024 at 9:58 AM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
>
> * Jeff Xu <jeffxu@...gle.com> [240812 10:30]:
> > + Kees who commented on mseal() series before. Please keep Kees in the
> > cc for future response to this series.
> >
> > On Fri, Aug 9, 2024 at 12:25 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
> > >
> > > * Pedro Falcato <pedro.falcato@...il.com> [240809 14:53]:
> > > > On Fri, Aug 9, 2024 at 5:48 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
> > > > >
> > > > > * Liam R. Howlett <Liam.Howlett@...cle.com> [240809 12:15]:
> > > > > > * Pedro Falcato <pedro.falcato@...il.com> [240807 17:13]:
> > > > > > > We were doing an extra mmap tree traversal just to check if the entire
> > > > > > > range is modifiable. This can be done when we iterate through the VMAs
> > > > > > > instead.
> > > > > > >
> > > > > > > Signed-off-by: Pedro Falcato <pedro.falcato@...il.com>
> > > > > > > ---
> > > > > > >  mm/mmap.c | 13 +------------
> > > > > > >  mm/vma.c  | 23 ++++++++++++-----------
> > > > > > >  2 files changed, 13 insertions(+), 23 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > > > index 4a9c2329b09..c1c7a7d00f5 100644
> > > > > > > --- a/mm/mmap.c
> > > > > > > +++ b/mm/mmap.c
> > > > > > > @@ -1740,18 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > > > >             unsigned long start, unsigned long end, struct list_head *uf,
> > > > > > >             bool unlock)
> > > > > > >  {
> > > > > > > -   struct mm_struct *mm = vma->vm_mm;
> > > > > > > -
> > > > > > > -   /*
> > > > > > > -    * Check if memory is sealed before arch_unmap.
> > > > > > > -    * Prevent unmapping a sealed VMA.
> > > > > > > -    * can_modify_mm assumes we have acquired the lock on MM.
> > > > > > > -    */
> > > > > > > -   if (unlikely(!can_modify_mm(mm, start, end)))
> > > > > > > -           return -EPERM;
> > > > > > > -
> > > > > > > -   arch_unmap(mm, start, end);
> > > > > > > -   return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > > > > > > +   return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> > > > > > >  }
> > > > > > >
> > > > > > >  /*
> > > > > > > diff --git a/mm/vma.c b/mm/vma.c
> > > > > > > index bf0546fe6ea..7a121bcc907 100644
> > > > > > > --- a/mm/vma.c
> > > > > > > +++ b/mm/vma.c
> > > > > > > @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > > > >             if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> > > > > > >                     goto map_count_exceeded;
> > > > > > >
> > > > > > > +           /* Don't bother splitting the VMA if we can't unmap it anyway */
> > > > > > > +           if (!can_modify_vma(vma)) {
> > > > > > > +                   error = -EPERM;
> > > > > > > +                   goto start_split_failed;
> > > > > > > +           }
> > > > > > > +
> > > > > >
> > > > > > Would this check be better placed in __split_vma()?  It could replace
> > > > > > both this and the next chunk of code.
> > > > >
> > > > > not quite.
> > > >
> > > > Yeah, I was going to say that splitting a sealed VMA is okay (and we
> > > > allow it on mlock and madvise).
> > >
> > > splitting a sealed vma wasn't supposed to be okay.. but it is Jeff's
> > > feature.  Jeff?
> > >
> > Splitting a sealed VMA is wrong.
> > Whoever wants to split a sealed VMA should  answer this question
> > first: what is the use case for splitting a sealed VMA?
>
> If we lower the checks to __split_vma() and anywhere that does entire
> vma modifications, then we would avoid modifications of the vma.  This
> particular loop breaks on the final vma, if it needs splitting, so we'd
> still need the check in the main loop to ensure the full vma isn't
> mseal()'ed.  Splitting the start/end could be covered by the
> __split_vma() function.
>
> >
> > The V2 series doesn't have selftest change which indicates lack of
> > testing. The out-of-loop check is positioned nearer to the API entry
> > point and separated from internal business logic, thereby minimizing
> > the testing requirements. However, as we move the sealing check
> > further inward and intertwine it with business logic, greater test
> > coverage becomes necessary to ensure  the correctness of  sealing
> > is preserved.
>
> I would have expected more complete test coverage and not limited to
> what is expected to happen with an up front test.  Changes may happen
> that you aren't Cc'ed on (or when you are away, etc) that could cause a
> silent failure which could go undetected for a prolonged period of time.
>
> If splitting a vma isn't okay, then it would be safer to test at least
> in some scenarios in the upstream tests.  Ideally, all tests are
> upstream so everyone can run the testing.
>
We will want to be careful about our expectation of  test coverage
offered in selftest. When adding mseal, I added 40+ test cases to
ensure mseal didn't regress on existing mm api, i.e. you can take the
mseal test , make a small modification (removing seal=1) and run on an
old version of kernel and they will pass. I think it is wrong to
expect the selftest is all it takes to find a regression if the dev is
doing a  *** major design/feature change ***. While it is possible to
write test cases to guide all future changes, doing so requires much
bigger effort with diminishing returns, essentially  it is testing an
"impossible to reach cases" in existing code.

Thanks
-Jeff

> Thanks,
> Liam