linux-kernel - Re: [PATCH v8 0/7] mm: folio_zero

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251027143309.4331a65f38f05ea95d9e46ad@linux-foundation.org>
Date: Mon, 27 Oct 2025 14:33:09 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
 david@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
 mingo@...hat.com, mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
 acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
 willy@...radead.org, raghavendra.kt@....com, boris.ostrovsky@...cle.com,
 konrad.wilk@...cle.com
Subject: Re: [PATCH v8 0/7] mm: folio_zero_user: clear contiguous pages

On Mon, 27 Oct 2025 13:21:02 -0700 Ankur Arora <ankur.a.arora@...cle.com> wrote:

> This series adds clearing of contiguous page ranges for hugepages,
> improving on the current page-at-a-time approach in two ways:
> 
>  - amortizes the per-page setup cost over a larger extent
>  - when using string instructions, exposes the real region size
>    to the processor.
> 
> A processor could use a knowledge of the extent to optimize the
> clearing. AMD Zen uarchs, as an example, elide allocation of
> cachelines for regions larger than L3-size.
> 
> Demand faulting a 64GB region shows performance improvements:
> 
>  $ perf bench mem map -p $pg-sz -f demand -s 64GB -l 5
> 
>                        baseline              +series             change
> 
>                   (GB/s  +- %stdev)     (GB/s  +- %stdev)
> 
>    pg-sz=2MB       12.92  +- 2.55%        17.03  +-  0.70%       + 31.8%	preempt=*
> 
>    pg-sz=1GB       17.14  +- 2.27%        18.04  +-  1.05% [#]   +  5.2%	preempt=none|voluntary
>    pg-sz=1GB       17.26  +- 1.24%        42.17  +-  4.21%       +144.3%	preempt=full|lazy
> 
> [#] Milan uses a threshold of LLC-size (~32MB) for eliding cacheline
> allocation, which is higher than the maximum extent used on x86
> (ARCH_CONTIG_PAGE_NR=8MB), so preempt=none|voluntary sees no improvement
> with pg-sz=1GB.

I wasn't understanding this preemption thing at all, but then I saw this
in the v4 series changelogging:

: [#] Only with preempt=full|lazy because cooperatively preempted models
: need regular invocations of cond_resched(). This limits the extent
: sizes that can be cleared as a unit.

Please put this back in!!

It's possible that we're being excessively aggressive with those
cond_resched()s.  Have you investigating tuning their frequency so we
can use larger extent sizes with these preemption models?

> The anon-w-seq test in the vm-scalability benchmark, however, does show
> worse performance with utime increasing by ~9%:
> 
>                          stime                  utime
> 
>   baseline         1654.63 ( +- 3.84% )     811.00 ( +- 3.84% )
>   +series          1630.32 ( +- 2.73% )     886.37 ( +- 5.19% )
> 
> In part this is because anon-w-seq runs with 384 processes zeroing
> anonymously mapped memory which they then access sequentially. As
> such this is a likely uncommon pattern where the memory bandwidth
> is saturated while also being cache limited because we access the
> entire region.
> 
> Raghavendra also tested previous version of the series on AMD Genoa [1].

I suggest you paste Raghavendra's results into this [0/N] - it's
important material.  

> 
> ...
>
>  arch/alpha/include/asm/page.h      |  1 -
>  arch/arc/include/asm/page.h        |  2 +
>  arch/arm/include/asm/page-nommu.h  |  1 -
>  arch/arm64/include/asm/page.h      |  1 -
>  arch/csky/abiv1/inc/abi/page.h     |  1 +
>  arch/csky/abiv2/inc/abi/page.h     |  7 ---
>  arch/hexagon/include/asm/page.h    |  1 -
>  arch/loongarch/include/asm/page.h  |  1 -
>  arch/m68k/include/asm/page_mm.h    |  1 +
>  arch/m68k/include/asm/page_no.h    |  1 -
>  arch/microblaze/include/asm/page.h |  1 -
>  arch/mips/include/asm/page.h       |  1 +
>  arch/nios2/include/asm/page.h      |  1 +
>  arch/openrisc/include/asm/page.h   |  1 -
>  arch/parisc/include/asm/page.h     |  1 -
>  arch/powerpc/include/asm/page.h    |  1 +
>  arch/riscv/include/asm/page.h      |  1 -
>  arch/s390/include/asm/page.h       |  1 -
>  arch/sparc/include/asm/page_32.h   |  2 +
>  arch/sparc/include/asm/page_64.h   |  1 +
>  arch/um/include/asm/page.h         |  1 -
>  arch/x86/include/asm/page.h        |  6 ---
>  arch/x86/include/asm/page_32.h     |  6 +++
>  arch/x86/include/asm/page_64.h     | 64 ++++++++++++++++++-----
>  arch/x86/lib/clear_page_64.S       | 39 +++-----------
>  arch/xtensa/include/asm/page.h     |  1 -
>  include/linux/highmem.h            | 29 +++++++++++
>  include/linux/mm.h                 | 69 +++++++++++++++++++++++++
>  mm/memory.c                        | 82 ++++++++++++++++++++++--------
>  mm/util.c                          | 13 +++++
>  30 files changed, 247 insertions(+), 91 deletions(-)

I guess this is an mm.git thing, with x86 acks (please).

The documented review activity is rather thin at this time so I'll sit
this out for a while.  Please ping me next week and we can reassess,

Thanks.