[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250603204414.f2963e4a094e360cad7f966e@linux-foundation.org>
Date: Tue, 3 Jun 2025 20:44:14 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: lizhe.67@...edance.com
Cc: david@...hat.com, jgg@...pe.ca, jhubbard@...dia.com, peterx@...hat.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, dev.jain@....com,
muchun.song@...ux.dev
Subject: Re: [PATCH v2] gup: optimize longterm pin_user_pages() for large
folio
On Wed, 4 Jun 2025 11:15:36 +0800 lizhe.67@...edance.com wrote:
> From: Li Zhe <lizhe.67@...edance.com>
>
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function pin_user_pages() by reducing the frequency of
> memory accesses using READ_ONCE. This patch leverages this approach to
> achieve performance improvements.
>
> The performance test results obtained through the gup_test tool from the
> kernel source tree are as follows. We achieve an improvement of over 70%
> for large folio with pagesize=2M. For normal page, we have only observed
> a very slight degradation in performance.
>
> Without this patch:
>
> [root@...alhost ~] ./gup_test -HL -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@...alhost ~]# ./gup_test -LT -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> With this patch:
>
> [root@...alhost ~] ./gup_test -HL -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:4075 put:10792 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@...alhost ~]# ./gup_test -LT -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:130727 put:31763 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
I see no READ_ONCE()s in the patch and I had to go off and read the v1
review to discover that the READ_ONCE is invoked in
page_folio()->_compound_head(). Please help us out by including such
details in the changelogs.
Is it credible that a humble READ_ONCE could yield a 3x improvement in
one case? Why would this happen?
Powered by blists - more mailing lists