[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <784c68fa023e99c53cd07265f0524e386815b443.camel@huaweicloud.com>
Date: Fri, 18 Oct 2024 17:45:59 +0200
From: Roberto Sassu <roberto.sassu@...weicloud.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com, vbabka@...e.cz,
jannh@...gle.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
ebpqwerty472123@...il.com, paul@...l-moore.com, zohar@...ux.ibm.com,
dmitry.kasatkin@...il.com, eric.snowberg@...cle.com, jmorris@...ei.org,
serge@...lyn.com, linux-integrity@...r.kernel.org,
linux-security-module@...r.kernel.org, bpf@...r.kernel.org,
linux-fsdevel@...r.kernel.org, "Kirill A. Shutemov"
<kirill.shutemov@...ux.intel.com>, stable@...r.kernel.org,
syzbot+91ae49e1c1a2634d20c0@...kaller.appspotmail.com, Roberto Sassu
<roberto.sassu@...wei.com>
Subject: Re: [RFC][PATCH] mm: Split locks in remap_file_pages()
On Fri, 2024-10-18 at 16:42 +0100, Lorenzo Stoakes wrote:
> On Fri, Oct 18, 2024 at 04:47:10PM +0200, Roberto Sassu wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> >
> > Commit ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in
> > remap_file_pages()") fixed a security issue, it added an LSM check when
> > trying to remap file pages, so that LSMs have the opportunity to evaluate
> > such action like for other memory operations such as mmap() and mprotect().
> >
> > However, that commit called security_mmap_file() inside the mmap_lock lock,
> > while the other calls do it before taking the lock, after commit
> > 8b3ec6814c83 ("take security_mmap_file() outside of ->mmap_sem").
> >
> > This caused lock inversion issue with IMA which was taking the mmap_lock
> > and i_mutex lock in the opposite way when the remap_file_pages() system
> > call was called.
> >
> > Solve the issue by splitting the critical region in remap_file_pages() in
> > two regions: the first takes a read lock of mmap_lock and retrieves the VMA
> > and the file associated, and calculate the 'prot' and 'flags' variable; the
> > second takes a write lock on mmap_lock, checks that the VMA flags and the
> > VMA file descriptor are the same as the ones obtained in the first critical
> > region (otherwise the system call fails), and calls do_mmap().
> >
> > In between, after releasing the read lock and taking the write lock, call
> > security_mmap_file(), and solve the lock inversion issue.
>
> Great description!
>
> >
> > Cc: stable@...r.kernel.org
> > Fixes: ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
> > Reported-by: syzbot+91ae49e1c1a2634d20c0@...kaller.appspotmail.com
> > Closes: https://lore.kernel.org/linux-security-module/66f7b10e.050a0220.46d20.0036.GAE@google.com/
> > Reviewed-by: Roberto Sassu <roberto.sassu@...wei.com> (Calculate prot and flags earlier)
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>
> Other than some nits below:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> I think you're definitely good to un-RFC here.
Perfect, will do. Thank you!
Roberto
> > ---
> > mm/mmap.c | 62 ++++++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 45 insertions(+), 17 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 9c0fb43064b5..762944427e03 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1640,6 +1640,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > unsigned long populate = 0;
> > unsigned long ret = -EINVAL;
> > struct file *file;
> > + vm_flags_t vm_flags;
> >
> > pr_warn_once("%s (%d) uses deprecated remap_file_pages() syscall. See Documentation/mm/remap_file_pages.rst.\n",
> > current->comm, current->pid);
> > @@ -1656,12 +1657,53 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> > return ret;
> >
> > - if (mmap_write_lock_killable(mm))
> > + if (mmap_read_lock_killable(mm))
> > + return -EINTR;
>
> I'm kinda verbose generally, but I'd love a comment like:
>
> /*
> * Look up VMA under read lock first so we can perform the security
> * without holding locks (which can be problematic). We reacquire a
> * write lock later and check nothing changed underneath us.
> */
>
> > +
> > + vma = vma_lookup(mm, start);
> > +
> > + if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > + mmap_read_unlock(mm);
> > + return -EINVAL;
> > + }
> > +
> > + prot |= vma->vm_flags & VM_READ ? PROT_READ : 0;
> > + prot |= vma->vm_flags & VM_WRITE ? PROT_WRITE : 0;
> > + prot |= vma->vm_flags & VM_EXEC ? PROT_EXEC : 0;
> > +
> > + flags &= MAP_NONBLOCK;
> > + flags |= MAP_SHARED | MAP_FIXED | MAP_POPULATE;
> > + if (vma->vm_flags & VM_LOCKED)
> > + flags |= MAP_LOCKED;
> > +
> > + /* Save vm_flags used to calculate prot and flags, and recheck later. */
> > + vm_flags = vma->vm_flags;
> > + file = get_file(vma->vm_file);
> > +
> > + mmap_read_unlock(mm);
> > +
>
> Maybe worth adding a comment to explain why you're doing this without the
> lock so somebody looking at this later can understand the dance?
>
> > + ret = security_mmap_file(file, prot, flags);
> > + if (ret) {
> > + fput(file);
> > + return ret;
> > + }
> > +
> > + ret = -EINVAL;
> > +
>
> Again, being verbose, I'd put something here like:
>
> /* OK security check passed, take write lock + let it rip */
>
> > + if (mmap_write_lock_killable(mm)) {
> > + fput(file);
> > return -EINTR;
> > + }
> >
> > vma = vma_lookup(mm, start);
> >
> > - if (!vma || !(vma->vm_flags & VM_SHARED))
> > + if (!vma)
> > + goto out;
> > +
>
> I'd also add something like:
>
> /* Make sure things didn't change under us. */
>
> > + if (vma->vm_flags != vm_flags)
> > + goto out;
> > +
>
> And drop this newline to group them together (super nitty I know, sorry!)
>
> > + if (vma->vm_file != file)
> > goto out;
> >
> > if (start + size > vma->vm_end) {
> > @@ -1689,25 +1731,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > goto out;
> > }
> >
> > - prot |= vma->vm_flags & VM_READ ? PROT_READ : 0;
> > - prot |= vma->vm_flags & VM_WRITE ? PROT_WRITE : 0;
> > - prot |= vma->vm_flags & VM_EXEC ? PROT_EXEC : 0;
> > -
> > - flags &= MAP_NONBLOCK;
> > - flags |= MAP_SHARED | MAP_FIXED | MAP_POPULATE;
> > - if (vma->vm_flags & VM_LOCKED)
> > - flags |= MAP_LOCKED;
> > -
> > - file = get_file(vma->vm_file);
> > - ret = security_mmap_file(vma->vm_file, prot, flags);
> > - if (ret)
> > - goto out_fput;
> > ret = do_mmap(vma->vm_file, start, size,
> > prot, flags, 0, pgoff, &populate, NULL);
> > -out_fput:
> > - fput(file);
> > out:
> > mmap_write_unlock(mm);
> > + fput(file);
> > if (populate)
> > mm_populate(ret, populate);
> > if (!IS_ERR_VALUE(ret))
> > --
> > 2.34.1
> >
>
> These are just nits, this looks good to me!
Powered by blists - more mailing lists