linux-kernel - Re: [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20231012083857.ty66retpyhxkaem3@quack3>
Date:   Thu, 12 Oct 2023 10:38:57 +0200
From:   Jan Kara <jack@...e.cz>
To:     Lorenzo Stoakes <lstoakes@...il.com>
Cc:     Jan Kara <jack@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Hugh Dickins <hughd@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        linux-fsdevel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v3 3/3] mm: enforce the mapping_map_writable() check
 after call_mmap()

On Wed 11-10-23 19:14:10, Lorenzo Stoakes wrote:
> On Wed, Oct 11, 2023 at 11:46:27AM +0200, Jan Kara wrote:
> > On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote:
> > > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to
> > > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either
> > > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so.
> > >
> > > We would otherwise fail the mapping_map_writable() check before we had
> > > the opportunity to clear VM_MAYWRITE.
> > >
> > > However, the existing logic in mmap_region() performs this check BEFORE
> > > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce
> > > this check AFTER the function call.
> > >
> > > In order to avoid any risk of breaking call_mmap() handlers which assume
> > > this will have been done first, we continue to mark the file writable
> > > first, simply deferring enforcement of it failing until afterwards.
> > >
> > > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's
> > > sealed via F_SEAL_WRITE to succeed, whereas previously they were not
> > > permitted.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@...il.com>
> >
> > ...
> >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 6f6856b3267a..9fbee92aaaee 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > >  	vma->vm_pgoff = pgoff;
> > >
> > >  	if (file) {
> > > -		if (is_shared_maywrite(vm_flags)) {
> > > -			error = mapping_map_writable(file->f_mapping);
> > > -			if (error)
> > > -				goto free_vma;
> > > -		}
> > > +		int writable_error = 0;
> > > +
> > > +		if (vma_is_shared_maywrite(vma))
> > > +			writable_error = mapping_map_writable(file->f_mapping);
> > >
> > >  		vma->vm_file = get_file(file);
> > >  		error = call_mmap(file, vma);
> > >  		if (error)
> > >  			goto unmap_and_free_vma;
> > >
> > > +		/*
> > > +		 * call_mmap() may have changed VMA flags, so retry this check
> > > +		 * if it failed before.
> > > +		 */
> > > +		if (writable_error && vma_is_shared_maywrite(vma)) {
> > > +			error = writable_error;
> > > +			goto close_and_free_vma;
> > > +		}
> >
> > Hum, this doesn't quite give me a peace of mind ;). One bug I can see is
> > that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop
> > i_mmap_writeable counter here?
> 
> This wouldn't be applicable in the F_SEAL_WRITE case, as the
> i_mmap_writable counter would already have been decremented, and thus an
> error would arise causing no further decrement, and everything would work
> fine.
> 
> It'd be very odd for something to be writable here but the driver to make
> it not writable. But we do need to account for this.

Yeah, it may be odd but this is indeed what i915 driver appears to be
doing in i915_gem_object_mmap():

        if (i915_gem_object_is_readonly(obj)) {
                if (vma->vm_flags & VM_WRITE) {
                        i915_gem_object_put(obj);
                        return -EINVAL;
                }
                vm_flags_clear(vma, VM_MAYWRITE);
        }

> > I've checked why your v2 version broke i915 and I think the reason maybe
> > has nothing to do with i915. Just in case call_mmap() failed, it ended up
> > jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we
> > didn't call mapping_map_writable() yet so the counter became imbalanced.
> 
> yeah that must be the cause, I thought perhaps somehow
> __remove_shared_vm_struct() got invoked by i915_gem_mmap() but I didn't
> trace it through to see if it was possible.
> 
> Looking at it again, i don't think that is possible, as we hold a mmap/vma
> write lock, and the only operations that can cause
> __remove_shared_vm_struct() to run are things that would not be able to do
> so with this lock held.
> 
> > So I'd be for returning to v2 version, just fix up the error handling
> > paths...
> 
> So in conclusion, I agree, this is the better approach. Will respin in v4.

Thanks!
								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR