linux-kernel - Re: [PATCH 1/2] mm: reorganize internal_get_user_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16c50bb0-431d-5bfb-7b80-a8af0b4da90f@nvidia.com>
Date:   Fri, 23 Oct 2020 21:44:17 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Jason Gunthorpe <jgg@...dia.com>, <linux-kernel@...r.kernel.org>
CC:     Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Hellwig <hch@....de>,
        Hugh Dickins <hughd@...gle.com>, Jan Kara <jack@...e.cz>,
        Jann Horn <jannh@...gle.com>,
        Kirill Shutemov <kirill@...temov.name>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Linux-MM <linux-mm@...ck.org>, Michal Hocko <mhocko@...e.com>,
        Oleg Nesterov <oleg@...hat.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
> The next patch in this series makes the lockless flow a little more
> complex, so move the entire block into a new function and remove a level
> of indention. Tidy a bit of cruft:
> 
>   - addr is always the same as start, so use start
>   - Use the modern check_add_overflow() for computing end = start + len
>   - nr_pinned << PAGE_SHIFT needs an unsigned long cast, like nr_pages
>   - The handling of ret and nr_pinned can be streamlined a bit
> 
> No functional change.

Looks nice. Because of the above, which is not just code movement but
quite a bit more than that, I had to rake through it with a fine-toothed
comb to be sure it's OK. I think it is. But as a side effect, I noticed
some tiny polishing ideas that popped up, see below.


> 
> Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
> ---
>   mm/gup.c | 88 +++++++++++++++++++++++++++++---------------------------
>   1 file changed, 46 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 102877ed77a4b4..ecbe1639ea2af7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2671,13 +2671,42 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
>   	return ret;
>   }
>   
> +static unsigned int lockless_pages_from_mm(unsigned long addr,

It would be slightly more consistent to use "start" here, too, instead of addr.

Separately, I'm not joyful about the change to unsigned int for the
return type. I understand why you did it and that's perfectly sound
reasoning: there is no -ERRNO possible here, and nr_pinned will always
be >=0. And it's correct, although it does have a type mismatch in the
return value.

However, I'd argue, mildly, that it's better to just leave everything as
"int" for nr_pages and nr_pinned, in gup.c, and just keep all the types
matched perfectly. This helps avoid bugs. And if we want to meaningfully
upgrade that, then either:

a) change all the nr_pages and nr_pinned throughout, to "long", or

b) change all the nr_pages and nr_pinned all function args, and use int
return types throughout, as a "O or -ERRNO, only" return value
convention.



> +					   unsigned long end,
> +					   unsigned int gup_flags,
> +					   struct page **pages)
> +{
> +	unsigned long flags;
> +	int nr_pinned = 0;
> +
> +	if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
> +	    !gup_fast_permitted(addr, end))

That actually fits on a single 80-col line, so let's leave it that way:

	if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) || !gup_fast_permitted(addr, end))

> +		return 0;
> +
> +	/*
> +	 * Disable interrupts. The nested form is used, in order to allow full,
> +	 * general purpose use of this routine.
> +	 *
> +	 * With interrupts disabled, we block page table pages from being freed
> +	 * from under us. See struct mmu_table_batch comments in
> +	 * include/asm-generic/tlb.h for more details.
> +	 *
> +	 * We do not adopt an rcu_read_lock(.) here as we also want to block
> +	 * IPIs that come from THPs splitting.
> +	 */
> +	local_irq_save(flags);
> +	gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
> +	local_irq_restore(flags);
> +	return nr_pinned;
> +}
> +
>   static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
>   					unsigned int gup_flags,
>   					struct page **pages)
>   {
> -	unsigned long addr, len, end;
> -	unsigned long flags;
> -	int nr_pinned = 0, ret = 0;
> +	unsigned long len, end;
> +	unsigned int nr_pinned;

If you choose to take the advice above, then this should be left as "int
nr_pinned".

> +	int ret;
>   
>   	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
>   				       FOLL_FORCE | FOLL_PIN | FOLL_GET |
> @@ -2691,53 +2720,28 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
>   		might_lock_read(&current->mm->mmap_lock);
>   
>   	start = untagged_addr(start) & PAGE_MASK;
> -	addr = start;
>   	len = (unsigned long) nr_pages << PAGE_SHIFT;
> -	end = start + len;
> -
> -	if (end <= start)
> +	if (check_add_overflow(start, len, &end))
>   		return 0;

Very nice.

>   	if (unlikely(!access_ok((void __user *)start, len)))
>   		return -EFAULT;
>   
> -	/*
> -	 * Disable interrupts. The nested form is used, in order to allow
> -	 * full, general purpose use of this routine.
> -	 *
> -	 * With interrupts disabled, we block page table pages from being
> -	 * freed from under us. See struct mmu_table_batch comments in
> -	 * include/asm-generic/tlb.h for more details.
> -	 *
> -	 * We do not adopt an rcu_read_lock(.) here as we also want to

I wish I had not copied that comment verbatim when I moved it here. Can
you please delete the weird ".", so that it just reads:

	 * We do not adopt an rcu_read_lock() here, because we also want to

> -	 * block IPIs that come from THPs splitting.
> -	 */
> -	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
> -		unsigned long fast_flags = gup_flags;
> -
> -		local_irq_save(flags);
> -		gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
> -		local_irq_restore(flags);
> -		ret = nr_pinned;
> -	}
> -
> -	if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
> -		/* Try to get the remaining pages with get_user_pages */
> -		start += nr_pinned << PAGE_SHIFT;
> -		pages += nr_pinned;
> -
> -		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
> -					      gup_flags, pages);
> +	nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
> +	if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
> +		return nr_pinned;
>   
> +	/* Try to get the remaining pages with get_user_pages */

Could we tweak that to this, as long as we're here:

	/* Slow path: Try to get the remaining pages with get_user_pages */


> +	start += (unsigned long)nr_pinned << PAGE_SHIFT;
> +	pages += nr_pinned;
> +	ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> +				      pages);
> +	if (ret < 0) {
>   		/* Have to be a bit careful with return values */

...and can we move that comment up one level, so that it reads:

	/* Have to be a bit careful with return values */
	if (ret < 0) {
		if (nr_pinned)
			return nr_pinned;
		return ret;
	}
	return ret + nr_pinned;

Thinking about this longer term, it would be nice if the whole gup/pup API
set just stopped pretending that anyone cares about partial success, because
they *don't*. If we had return values of "0 or -ERRNO" throughout, and an
additional set of API wrappers that did some sort of limited retry just like
some of the callers do, that would be a happier story.

That has nothing to do with your patch, I'm just baiting Matthew Wilcox and
linux-mm to tell me that they love the idea, or hate it, or don't care. :)


> -		if (nr_pinned > 0) {
> -			if (ret < 0)
> -				ret = nr_pinned;
> -			else
> -				ret += nr_pinned;
> -		}
> +		if (nr_pinned)
> +			return nr_pinned;
> +		return ret;
>   	}
> -
> -	return ret;
> +	return ret + nr_pinned;
>   }
>   /**
>    * get_user_pages_fast_only() - pin user pages in memory
> 

thanks,
-- 
John Hubbard
NVIDIA