lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250603204414.f2963e4a094e360cad7f966e@linux-foundation.org>
Date: Tue, 3 Jun 2025 20:44:14 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: lizhe.67@...edance.com
Cc: david@...hat.com, jgg@...pe.ca, jhubbard@...dia.com, peterx@...hat.com,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, dev.jain@....com,
 muchun.song@...ux.dev
Subject: Re: [PATCH v2] gup: optimize longterm pin_user_pages() for large
 folio

On Wed,  4 Jun 2025 11:15:36 +0800 lizhe.67@...edance.com wrote:

> From: Li Zhe <lizhe.67@...edance.com>
> 
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function pin_user_pages() by reducing the frequency of
> memory accesses using READ_ONCE. This patch leverages this approach to
> achieve performance improvements.
> 
> The performance test results obtained through the gup_test tool from the
> kernel source tree are as follows. We achieve an improvement of over 70%
> for large folio with pagesize=2M. For normal page, we have only observed
> a very slight degradation in performance.
> 
> Without this patch:
> 
>     [root@...alhost ~] ./gup_test -HL -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>     [root@...alhost ~]# ./gup_test -LT -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> With this patch:
> 
>     [root@...alhost ~] ./gup_test -HL -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:4075 put:10792 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>     [root@...alhost ~]# ./gup_test -LT -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:130727 put:31763 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

I see no READ_ONCE()s in the patch and I had to go off and read the v1
review to discover that the READ_ONCE is invoked in
page_folio()->_compound_head().  Please help us out by including such
details in the changelogs.

Is it credible that a humble READ_ONCE could yield a 3x improvement in
one case?  Why would this happen?


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ