linux-kernel - Re: [PATCH v11 8/8] mm: folio_zero

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d6f84476-e407-4d6b-a892-493c4359f86f@kernel.org>
Date: Wed, 7 Jan 2026 23:18:36 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, x86@...nel.org
Cc: akpm@...ux-foundation.org, bp@...en8.de, dave.hansen@...ux.intel.com,
 hpa@...or.com, mingo@...hat.com, mjguzik@...il.com, luto@...nel.org,
 peterz@...radead.org, tglx@...utronix.de, willy@...radead.org,
 raghavendra.kt@....com, chleroy@...nel.org, ioworker0@...il.com,
 lizhe.67@...edance.com, boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v11 8/8] mm: folio_zero_user: cache neighbouring pages

On 1/7/26 08:20, Ankur Arora wrote:
> folio_zero_user() does straight zeroing without caring about
> temporal locality for caches.
> 
> This replaced commit c6ddfb6c5890 ("mm, clear_huge_page: move order
> algorithm into a separate function") where we cleared a page at a
> time converging to the faulting page from the left and the right.
> 
> To retain limited temporal locality, split the clearing in three
> parts: the faulting page and its immediate neighbourhood, and the
> regions on its left and right. We clear the local neighbourhood last
> to maximize chances of it sticking around in the cache.
> 
> Performance
> ===
> 
> AMD Genoa (EPYC 9J14, cpus=2 sockets * 96 cores * 2 threads,
>             memory=2.2 TB, L1d=16K/thread, L2=512K/thread, L3=2MB/thread)
> 
> vm-scalability/anon-w-seq-hugetlb: this workload runs with 384 processes
> (one for each CPU) each zeroing anonymously mapped hugetlb memory which
> is then accessed sequentially.
>                                  stime                utime
> 
>    discontiguous-page      1739.93 ( +- 6.15% )  1016.61 ( +- 4.75% )
>    contiguous-page         1853.70 ( +- 2.51% )  1187.13 ( +- 3.50% )
>    batched-pages           1756.75 ( +- 2.98% )  1133.32 ( +- 4.89% )
>    neighbourhood-last      1725.18 ( +- 4.59% )  1123.78 ( +- 7.38% )
> 
> Both stime and utime largely respond somewhat expectedly. There is a
> fair amount of run to run variation but the general trend is that the
> stime drops and utime increases. There are a few oddities, like
> contiguous-page performing very differently from batched-pages.
> 
> As such this is likely an uncommon pattern where we saturate the memory
> bandwidth (since all CPUs are running the test) and at the same time
> are cache constrained because we access the entire region.
> 
> Kernel make (make -j 12 bzImage):
> 
>                                stime                  utime
> 
>    discontiguous-page      199.29 ( +- 0.63% )   1431.67 ( +- .04% )
>    contiguous-page         193.76 ( +- 0.58% )   1433.60 ( +- .05% )
>    batched-pages           193.92 ( +- 0.76% )   1431.04 ( +- .08% )
>    neighbourhood-last      194.46 ( +- 0.68% )   1431.51 ( +- .06% )
> 
> For make the utime stays relatively flat with a fairly small (-2.4%)
> improvement in the stime.
> 
> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> Reviewed-by: Raghavendra K T <raghavendra.kt@....com>
> Tested-by: Raghavendra K T <raghavendra.kt@....com>
> ---

Nothing jumped at me, thanks!

Acked-by: David Hildenbrand (Red Hat) <david@...nel.org>

-- 
Cheers

David