[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_ys4jJ8MQ4-kW8P@gmail.com>
Date: Mon, 14 Apr 2025 08:36:18 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
mingo@...hat.com, luto@...nel.org, peterz@...radead.org,
paulmck@...nel.org, rostedt@...dmis.org, tglx@...utronix.de,
willy@...radead.org, jon.grimm@....com, bharata@....com,
raghavendra.kt@....com, boris.ostrovsky@...cle.com,
konrad.wilk@...cle.com
Subject: Re: [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing
* Ankur Arora <ankur.a.arora@...cle.com> wrote:
> We also see performance improvement for cases where this optimization is
> unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because
> REP; STOS is typically microcoded which can now be amortized over
> larger regions and the hint allows the hardware prefetcher to do a
> better job.
>
> Milan (EPYC 7J13, boost=0, preempt=full|lazy):
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- stddev) (GB/s +- stddev)
>
> pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2%
> pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5%
>
> Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy):
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- stddev) (GB/s +- stddev)
>
> pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57%
> pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10%
How was this measured? Could you integrate this measurement as a new
tools/perf/bench/ subcommand so that people can try it on different
systems, etc.? There's already a 'perf bench mem' subcommand space
where this feature could be added to.
Thanks,
Ingo
Powered by blists - more mailing lists