linux-kernel - Re: [PATCH v2 5/8] mm/gup: Accelerate thp gup even for "pages != NULL"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <956f7c72-4c7d-43a5-8786-5fdaa9010f7b@lucifer.local>
Date:   Tue, 20 Jun 2023 22:43:39 +0100
From:   Lorenzo Stoakes <lstoakes@...il.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        John Hubbard <jhubbard@...dia.com>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        James Houghton <jthoughton@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH v2 5/8] mm/gup: Accelerate thp gup even for "pages !=
 NULL"

On Mon, Jun 19, 2023 at 07:10:41PM -0400, Peter Xu wrote:
> The acceleration of THP was done with ctx.page_mask, however it'll be
> ignored if **pages is non-NULL.
>
> The old optimization was introduced in 2013 in 240aadeedc4a ("mm:
> accelerate mm_populate() treatment of THP pages").  It didn't explain why
> we can't optimize the **pages non-NULL case.  It's possible that at that
> time the major goal was for mm_populate() which should be enough back then.
>
> Optimize thp for all cases, by properly looping over each subpage, doing
> cache flushes, and boost refcounts / pincounts where needed in one go.
>
> This can be verified using gup_test below:
>
>   # chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10
>
> Before:    13992.50 ( +-8.75%)
> After:       378.50 (+-69.62%)
>
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
>  mm/gup.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 4a00d609033e..b50272012e49 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1199,16 +1199,53 @@ static long __get_user_pages(struct mm_struct *mm,
>  			goto out;
>  		}
>  next_page:
> -		if (pages) {
> -			pages[i] = page;
> -			flush_anon_page(vma, page, start);
> -			flush_dcache_page(page);
> -			ctx.page_mask = 0;
> -		}
> -
>  		page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
>  		if (page_increm > nr_pages)
>  			page_increm = nr_pages;
> +
> +		if (pages) {
> +			struct page *subpage;
> +			unsigned int j;
> +
> +			/*
> +			 * This must be a large folio (and doesn't need to
> +			 * be the whole folio; it can be part of it), do
> +			 * the refcount work for all the subpages too.
> +			 *
> +			 * NOTE: here the page may not be the head page
> +			 * e.g. when start addr is not thp-size aligned.
> +			 * try_grab_folio() should have taken care of tail
> +			 * pages.
> +			 */
> +			if (page_increm > 1) {
> +				struct folio *folio;
> +
> +				/*
> +				 * Since we already hold refcount on the
> +				 * large folio, this should never fail.
> +				 */
> +				folio = try_grab_folio(page, page_increm - 1,
> +						       foll_flags);
> +				if (WARN_ON_ONCE(!folio)) {
> +					/*
> +					 * Release the 1st page ref if the
> +					 * folio is problematic, fail hard.
> +					 */
> +					gup_put_folio(page_folio(page), 1,
> +						      foll_flags);
> +					ret = -EFAULT;
> +					goto out;
> +				}

Thanks this looks good to me, I agree it'd be quite surprising for us not
to retrieve folio here and probably something has gone wrong if so, so not
actually too unreasonable to warn, as long as we error out.

> +			}
> +
> +			for (j = 0; j < page_increm; j++) {
> +				subpage = nth_page(page, j);
> +				pages[i+j] = subpage;
> +				flush_anon_page(vma, subpage, start + j * PAGE_SIZE);
> +				flush_dcache_page(subpage);
> +			}
> +		}
> +
>  		i += page_increm;
>  		start += page_increm * PAGE_SIZE;
>  		nr_pages -= page_increm;
> --
> 2.40.1
>

Looks good to me overall,

Reviewed-by: Lorenzo Stoakes <lstoakes@...il.com>