[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fffd4dad-2cb9-4bc9-8a80-a70be687fd54@amd.com>
Date: Fri, 4 Jul 2025 13:45:13 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, x86@...nel.org
Cc: akpm@...ux-foundation.org, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, mingo@...hat.com, mjguzik@...il.com, luto@...nel.org,
peterz@...radead.org, acme@...nel.org, namhyung@...nel.org,
tglx@...utronix.de, willy@...radead.org, jon.grimm@....com, bharata@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v4 00/13] x86/mm: Add multi-page clearing
On 6/16/2025 10:52 AM, Ankur Arora wrote:
> This series adds multi-page clearing for hugepages, improving on the
> current page-at-a-time approach in two ways:
>
> - amortizes the per-page setup cost over a larger extent
> - when using string instructions, exposes the real region size to the
> processor. A processor could use that as a hint to optimize based
> on the full extent size. AMD Zen uarchs, as an example, elide
> allocation of cachelines for regions larger than L3-size.
>
> Demand faulting a 64GB region shows good performance improvements:
>
> $ perf bench mem map -p $page-size -f demand -s 64GB -l 5
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- %stdev) (GB/s +- %stdev)
>
> pg-sz=2MB 11.82 +- 0.67% 16.48 +- 0.30% + 39.4%
> pg-sz=1GB 17.51 +- 1.19% 40.03 +- 7.26% [#] +129.9%
>
> [#] Only with preempt=full|lazy because cooperatively preempted models
> need regular invocations of cond_resched(). This limits the extent
> sizes that can be cleared as a unit.
>
> Raghavendra also tested on AMD Genoa and that shows similar
> improvements [1].
>
[...]
Sorry for coming back late on this:
It was nice to have it integrated to perf bench mem (easy to test :)).
I do see similar (almost same) improvement again with the rebased kernel
and patchset.
Tested only preempt=lazy and boost=1
base 6.16-rc4 + 1-9 patches of this series
patched = 6.16-rc4 + all patches
SUT: Genoa+ AMD EPYC 9B24
$ perf bench mem map -p $page-size -f populate -s 64GB -l 10
base patched change
pg-sz=2MB 12.731939 GB/sec 26.304263 GB/sec 106.6%
pg-sz=1GB 26.232423 GB/sec 61.174836 GB/sec 133.2%
for 4kb page size there is a slight improvement (mostly a noise).
Thanks and Regards
- Raghu
Powered by blists - more mailing lists