lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecf02bf0-a107-4e84-93f0-48277fd4ba7c@kernel.org>
Date: Thu, 18 Dec 2025 08:22:37 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, x86@...nel.org
Cc: akpm@...ux-foundation.org, bp@...en8.de, dave.hansen@...ux.intel.com,
 hpa@...or.com, mingo@...hat.com, mjguzik@...il.com, luto@...nel.org,
 peterz@...radead.org, tglx@...utronix.de, willy@...radead.org,
 raghavendra.kt@....com, chleroy@...nel.org, ioworker0@...il.com,
 boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v10 6/8] x86/clear_page: Introduce clear_pages()

On 12/15/25 21:49, Ankur Arora wrote:
> Performance when clearing with string instructions (x86-64-stosq and
> similar) can vary significantly based on the chunk-size used.
> 
>    $ perf bench mem memset -k 4KB -s 4GB -f x86-64-stosq
>    # Running 'mem/memset' benchmark:
>    # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S)
>    # Copying 4GB bytes ...
> 
>        13.748208 GB/sec
> 
>    $ perf bench mem memset -k 2MB -s 4GB -f x86-64-stosq
>    # Running 'mem/memset' benchmark:
>    # function 'x86-64-stosq' (movsq-based memset() in
>    # arch/x86/lib/memset_64.S)
>    # Copying 4GB bytes ...
> 
>        15.067900 GB/sec
> 
>    $ perf bench mem memset -k 1GB -s 4GB -f x86-64-stosq
>    # Running 'mem/memset' benchmark:
>    # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S)
>    # Copying 4GB bytes ...
> 
>        38.104311 GB/sec
> 
> (Both on AMD Milan.)
> 
> With a change in chunk-size from 4KB to 1GB, we see the performance go
> from 13.7 GB/sec to 38.1 GB/sec. For the chunk-size of 2MB the change isn't
> quite as drastic but it is worth adding a clear_page() variant that can
> handle contiguous page-extents.
> 
> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> Tested-by: Raghavendra K T <raghavendra.kt@....com>

Nothing jumped at me.

Reviewed-by: David Hildenbrand (Red Hat) <david@...nel.org>

-- 
Cheers

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ