lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_ys4jJ8MQ4-kW8P@gmail.com>
Date: Mon, 14 Apr 2025 08:36:18 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
	mingo@...hat.com, luto@...nel.org, peterz@...radead.org,
	paulmck@...nel.org, rostedt@...dmis.org, tglx@...utronix.de,
	willy@...radead.org, jon.grimm@....com, bharata@....com,
	raghavendra.kt@....com, boris.ostrovsky@...cle.com,
	konrad.wilk@...cle.com
Subject: Re: [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing


* Ankur Arora <ankur.a.arora@...cle.com> wrote:

> We also see performance improvement for cases where this optimization is
> unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because
> REP; STOS is typically microcoded which can now be amortized over
> larger regions and the hint allows the hardware prefetcher to do a
> better job.
> 
> Milan (EPYC 7J13, boost=0, preempt=full|lazy):
> 
>                  mm/folio_zero_user    x86/folio_zero_user     change
>                   (GB/s  +- stddev)      (GB/s  +- stddev)
> 
>   pg-sz=1GB       16.51  +- 0.54%        42.80  +-  3.48%    + 159.2%
>   pg-sz=2MB       11.89  +- 0.78%        16.12  +-  0.12%    +  35.5%
> 
> Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy):
> 
>                  mm/folio_zero_user    x86/folio_zero_user     change
>                   (GB/s +- stddev)      (GB/s +- stddev)
> 
>   pg-sz=1GB       8.01  +- 0.24%        11.26 +- 0.48%       + 40.57%
>   pg-sz=2MB       7.95  +- 0.30%        10.90 +- 0.26%       + 37.10%

How was this measured? Could you integrate this measurement as a new 
tools/perf/bench/ subcommand so that people can try it on different 
systems, etc.? There's already a 'perf bench mem' subcommand space 
where this feature could be added to.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ