[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <01fbc3f2-bccb-4694-99ec-2ee8e9ff6e4e@lucifer.local>
Date: Fri, 15 Nov 2024 19:28:34 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: stable@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Jann Horn <jannh@...gle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Xu <peterx@...hat.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Mark Brown <broonie@...nel.org>,
"David S . Miller" <davem@...emloft.net>,
Andreas Larsson <andreas@...sler.com>,
"James E . J . Bottomley" <James.Bottomley@...senpartnership.com>,
Helge Deller <deller@....de>
Subject: Re: [PATCH 6.1.y 4/4] mm: resolve faulty mmap_region() error path
behaviour
On Fri, Nov 15, 2024 at 08:06:05PM +0100, Vlastimil Babka wrote:
> On 11/15/24 13:40, Lorenzo Stoakes wrote:
> > [ Upstream commit 5de195060b2e251a835f622759550e6202167641 ]
> >
> > The mmap_region() function is somewhat terrifying, with spaghetti-like
> > control flow and numerous means by which issues can arise and incomplete
> > state, memory leaks and other unpleasantness can occur.
> >
> > A large amount of the complexity arises from trying to handle errors late
> > in the process of mapping a VMA, which forms the basis of recently
> > observed issues with resource leaks and observable inconsistent state.
> >
> > Taking advantage of previous patches in this series we move a number of
> > checks earlier in the code, simplifying things by moving the core of the
> > logic into a static internal function __mmap_region().
> >
> > Doing this allows us to perform a number of checks up front before we do
> > any real work, and allows us to unwind the writable unmap check
> > unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
> > validation unconditionally also.
> >
> > We move a number of things here:
> >
> > 1. We preallocate memory for the iterator before we call the file-backed
> > memory hook, allowing us to exit early and avoid having to perform
> > complicated and error-prone close/free logic. We carefully free
> > iterator state on both success and error paths.
> >
> > 2. The enclosing mmap_region() function handles the mapping_map_writable()
> > logic early. Previously the logic had the mapping_map_writable() at the
> > point of mapping a newly allocated file-backed VMA, and a matching
> > mapping_unmap_writable() on success and error paths.
> >
> > We now do this unconditionally if this is a file-backed, shared writable
> > mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
> > doing so does not invalidate the seal check we just performed, and we in
> > any case always decrement the counter in the wrapper.
> >
> > We perform a debug assert to ensure a driver does not attempt to do the
> > opposite.
> >
> > 3. We also move arch_validate_flags() up into the mmap_region()
> > function. This is only relevant on arm64 and sparc64, and the check is
> > only meaningful for SPARC with ADI enabled. We explicitly add a warning
> > for this arch if a driver invalidates this check, though the code ought
> > eventually to be fixed to eliminate the need for this.
> >
> > With all of these measures in place, we no longer need to explicitly close
> > the VMA on error paths, as we place all checks which might fail prior to a
> > call to any driver mmap hook.
> >
> > This eliminates an entire class of errors, makes the code easier to reason
> > about and more robust.
> >
> > Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com
> > Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > Reported-by: Jann Horn <jannh@...gle.com>
> > Reviewed-by: Liam R. Howlett <Liam.Howlett@...cle.com>
> > Reviewed-by: Vlastimil Babka <vbabka@...e.cz>
> > Tested-by: Mark Brown <broonie@...nel.org>
> > Cc: Andreas Larsson <andreas@...sler.com>
> > Cc: Catalin Marinas <catalin.marinas@....com>
> > Cc: David S. Miller <davem@...emloft.net>
> > Cc: Helge Deller <deller@....de>
> > Cc: James E.J. Bottomley <James.Bottomley@...senPartnership.com>
> > Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> > Cc: Peter Xu <peterx@...hat.com>
> > Cc: Will Deacon <will@...nel.org>
> > Cc: <stable@...r.kernel.org>
> > Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > ---
> > mm/mmap.c | 103 +++++++++++++++++++++++++++++-------------------------
> > 1 file changed, 56 insertions(+), 47 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 322677f61d30..e457169c5cce 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2652,7 +2652,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> > return do_mas_munmap(&mas, mm, start, len, uf, false);
> > }
> >
> > -unsigned long mmap_region(struct file *file, unsigned long addr,
> > +static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > struct list_head *uf)
> > {
> > @@ -2750,26 +2750,28 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > vma->vm_page_prot = vm_get_page_prot(vm_flags);
> > vma->vm_pgoff = pgoff;
> >
> > - if (file) {
> > - if (vm_flags & VM_SHARED) {
> > - error = mapping_map_writable(file->f_mapping);
> > - if (error)
> > - goto free_vma;
> > - }
> > + if (mas_preallocate(&mas, vma, GFP_KERNEL)) {
> > + error = -ENOMEM;
> > + goto free_vma;
> > + }
> >
> > + if (file) {
> > vma->vm_file = get_file(file);
> > error = mmap_file(file, vma);
> > if (error)
> > - goto unmap_and_free_vma;
> > + goto unmap_and_free_file_vma;
> > +
> > + /* Drivers cannot alter the address of the VMA. */
> > + WARN_ON_ONCE(addr != vma->vm_start);
> >
> > /*
> > - * Expansion is handled above, merging is handled below.
> > - * Drivers should not alter the address of the VMA.
> > + * Drivers should not permit writability when previously it was
> > + * disallowed.
> > */
> > - if (WARN_ON((addr != vma->vm_start))) {
> > - error = -EINVAL;
> > - goto close_and_free_vma;
> > - }
> > + VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
> > + !(vm_flags & VM_MAYWRITE) &&
> > + (vma->vm_flags & VM_MAYWRITE));
> > +
> > mas_reset(&mas);
> >
> > /*
> > @@ -2792,7 +2794,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > vma = merge;
> > /* Update vm_flags to pick up the change. */
> > vm_flags = vma->vm_flags;
As far as I can tell we should add:
+ mas_destroy(&mas);
> > - goto unmap_writable;
> > + goto file_expanded;
>
> I think we might need a mas_destroy() somewhere around here otherwise we
> leak the prealloc? In later versions the merge operation takes our vma
> iterator so it handles that if merge succeeds, but here we have to cleanup
> our mas ourselves?
>
Sigh, yup. This code path is SO HORRIBLE. I think simply a
mas_destroy(&mas) here would suffice (see above).
I'm not sure how anything works with stable, I mean do we need to respin a
v2 just for one line?
> > }
> > }
> >
> > @@ -2800,31 +2802,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > } else if (vm_flags & VM_SHARED) {
> > error = shmem_zero_setup(vma);
> > if (error)
> > - goto free_vma;
> > + goto free_iter_vma;
> > } else {
> > vma_set_anonymous(vma);
> > }
> >
> > - /* Allow architectures to sanity-check the vm_flags */
> > - if (!arch_validate_flags(vma->vm_flags)) {
> > - error = -EINVAL;
> > - if (file)
> > - goto close_and_free_vma;
> > - else if (vma->vm_file)
> > - goto unmap_and_free_vma;
> > - else
> > - goto free_vma;
> > - }
> > -
> > - if (mas_preallocate(&mas, vma, GFP_KERNEL)) {
> > - error = -ENOMEM;
> > - if (file)
> > - goto close_and_free_vma;
> > - else if (vma->vm_file)
> > - goto unmap_and_free_vma;
> > - else
> > - goto free_vma;
> > - }
> > +#ifdef CONFIG_SPARC64
> > + /* TODO: Fix SPARC ADI! */
> > + WARN_ON_ONCE(!arch_validate_flags(vm_flags));
> > +#endif
> >
> > if (vma->vm_file)
> > i_mmap_lock_write(vma->vm_file->f_mapping);
> > @@ -2847,10 +2833,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > */
> > khugepaged_enter_vma(vma, vma->vm_flags);
> >
> > - /* Once vma denies write, undo our temporary denial count */
> > -unmap_writable:
> > - if (file && vm_flags & VM_SHARED)
> > - mapping_unmap_writable(file->f_mapping);
> > +file_expanded:
> > file = vma->vm_file;
> > expanded:
> > perf_event_mmap(vma);
> > @@ -2879,28 +2862,54 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >
> > vma_set_page_prot(vma);
> >
> > - validate_mm(mm);
> > return addr;
> >
> > -close_and_free_vma:
> > - vma_close(vma);
> > -unmap_and_free_vma:
> > +unmap_and_free_file_vma:
> > fput(vma->vm_file);
> > vma->vm_file = NULL;
> >
> > /* Undo any partial mapping done by a device driver. */
> > unmap_region(mm, mas.tree, vma, prev, next, vma->vm_start, vma->vm_end);
> > - if (file && (vm_flags & VM_SHARED))
> > - mapping_unmap_writable(file->f_mapping);
> > +free_iter_vma:
> > + mas_destroy(&mas);
> > free_vma:
> > vm_area_free(vma);
> > unacct_error:
> > if (charged)
> > vm_unacct_memory(charged);
> > - validate_mm(mm);
> > return error;
> > }
> >
> > +unsigned long mmap_region(struct file *file, unsigned long addr,
> > + unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > + struct list_head *uf)
> > +{
> > + unsigned long ret;
> > + bool writable_file_mapping = false;
> > +
> > + /* Allow architectures to sanity-check the vm_flags. */
> > + if (!arch_validate_flags(vm_flags))
> > + return -EINVAL;
> > +
> > + /* Map writable and ensure this isn't a sealed memfd. */
> > + if (file && (vm_flags & VM_SHARED)) {
> > + int error = mapping_map_writable(file->f_mapping);
> > +
> > + if (error)
> > + return error;
> > + writable_file_mapping = true;
> > + }
> > +
> > + ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf);
> > +
> > + /* Clear our write mapping regardless of error. */
> > + if (writable_file_mapping)
> > + mapping_unmap_writable(file->f_mapping);
> > +
> > + validate_mm(current->mm);
> > + return ret;
> > +}
> > +
> > static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
> > {
> > int ret;
>
Powered by blists - more mailing lists