linux-kernel - Re: [PATCH] userfaultfd: release page in error path to avoid BUG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.2104281338380.9009@eggly.anvils>
Date:   Wed, 28 Apr 2021 14:03:05 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Axel Rasmussen <axelrasmussen@...gle.com>
cc:     Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Lokesh Gidra <lokeshgidra@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] userfaultfd: release page in error path to avoid
 BUG_ON

On Wed, 28 Apr 2021, Peter Xu wrote:
> On Wed, Apr 28, 2021 at 11:01:09AM -0700, Axel Rasmussen wrote:
> > Consider the following sequence of events (described from the point of
> > view of the commit that introduced the bug - see "Fixes:" below):
> > 
> > 1. Userspace issues a UFFD ioctl, which ends up calling into
> >    shmem_mcopy_atomic_pte(). We successfully account the blocks, we
> >    shmem_alloc_page(), but then the copy_from_user() fails. We return
> >    -EFAULT. We don't release the page we allocated.
> > 2. Our caller detects this error code, tries the copy_from_user() after
> >    dropping the mmap_sem, and retries, calling back into
> >    shmem_mcopy_atomic_pte().
> > 3. Meanwhile, let's say another process filled up the tmpfs being used.
> > 4. So shmem_mcopy_atomic_pte() fails to account blocks this time, and
> >    immediately returns - without releasing the page. This triggers a
> >    BUG_ON in our caller, which asserts that the page should always be
> >    consumed, unless -EFAULT is returned.
> > 
> > (Later on in the commit history, -EFAULT became -ENOENT, mmap_sem became
> > mmap_lock, and shmem_inode_acct_block() was added.)
> 
> I suggest you do s/EFAULT/ENOENT/ directly in above.

I haven't looked into the history, but it would be best for this to
describe the situation in v5.12, never mind the details which were
different at the time of the commit tagged with Fixes.  But we stay
alert that when it's backported to stable, we may need to adjust
something to suit those releases (which will depend on how much
else has been backported to them meanwhile).

> 
> > 
> > A malicious user (even an unprivileged one) could trigger this
> > intentionally without too much trouble.

I regret having suggested that. Maybe. Opinions differ as to whether
it's helpful to call attention like that. I'd say delete that paragraph.

> > 
> > To fix this, detect if we have a "dangling" page when accounting fails,
> > and if so, release it before returning.
> > 
> > Fixes: cb658a453b93 ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY")
> > Reported-by: Hugh Dickins <hughd@...gle.com>
> > Signed-off-by: Axel Rasmussen <axelrasmussen@...gle.com>

Thanks for getting on to this so quickly, Axel.
But Peter is right, that unlock_page() needs removing.

> > ---
> >  mm/shmem.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 26c76b13ad23..46766c9d7151 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -2375,8 +2375,19 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
> >  	pgoff_t offset, max_off;
> >  
> >  	ret = -ENOMEM;
> > -	if (!shmem_inode_acct_block(inode, 1))
> > +	if (!shmem_inode_acct_block(inode, 1)) {
> > +		/*
> > +		 * We may have got a page, returned -ENOENT triggering a retry,
> > +		 * and now we find ourselves with -ENOMEM. Release the page, to
> > +		 * avoid a BUG_ON in our caller.
> > +		 */
> > +		if (unlikely(*pagep)) {
> > +			unlock_page(*pagep);
> 
> Not necessary?

Worse than not necessary: would trigger a VM_BUG_ON_PAGE(). Delete!

> 
> > +			put_page(*pagep);
> > +			*pagep = NULL;
> > +		}
> >  		goto out;
> 
> All "goto out" in this functions looks weird as it returns directly... so if
> you're touching this after all, I suggest we do "return -ENOMEM" directly and
> drop the "ret = -ENOMEM".

No strong feeling either way from me on that: whichever looks best
to you.  But I suspect the "ret = -ENOMEM" cannot be dropped,
because it's relied on further down too?

> 
> Thanks,
> 
> > +	}
> >  
> >  	if (!*pagep) {
> >  		page = shmem_alloc_page(gfp, info, pgoff);
> > -- 
> > 2.31.1.498.g6c1eba8ee3d-goog
> > 
> 
> -- 
> Peter Xu