linux-kernel - Re: [PATCH v11 0/8] mm: folio_zero

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260107100948.a059084c9f8dd8cbaf864c57@linux-foundation.org>
Date: Wed, 7 Jan 2026 10:09:48 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
 david@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
 mingo@...hat.com, mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
 tglx@...utronix.de, willy@...radead.org, raghavendra.kt@....com,
 chleroy@...nel.org, ioworker0@...il.com, lizhe.67@...edance.com,
 boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v11 0/8] mm: folio_zero_user: clear page ranges

On Tue,  6 Jan 2026 23:20:01 -0800 Ankur Arora <ankur.a.arora@...cle.com> wrote:

> Hi,
> 
> This series adds clearing of contiguous page ranges for hugepages.

Thanks, I updated mm.git to this version.

I have a new toy.  For every file which was altered in a patch series,
look up (in MAINTAINERS) all the people who have declared an interest
in that file.  Add all those people to cc for every patch.  Also add
all the people who the sender cc'ed.  For this series I ended up with
70+ cc's, which seems excessive, so I trimmed it to just your chosen
cc's.  I'm not sure what to do about this at present.

> v11:
>   - folio_zero_user(): unified the special casing of the gigantic page
>     with the hugetlb handling. Plus cleanups.
>   - highmem: unify clear_user_highpages() changes.
> 
>    (Both suggested by David Hildenbrand).
> 
>   - split patch "mm, folio_zero_user: support clearing page ranges"
>     from v10 into two separate patches:
> 
>       - patch-6 "mm: folio_zero_user: clear pages sequentially", which
>         switches to doing sequential clearing from process_huge_pages().
> 
>       - patch-7: "mm: folio_zero_user: clear page ranges", which
>         switches to clearing in batches.
> 
>   - PROCESS_PAGES_NON_PREEMPT_BATCH: define it as 32MB instead of the
>     earlier 8MB.
> 
>     (Both of these came out of a discussion with Andrew Morton.)
> 
>   (https://lore.kernel.org/lkml/20251215204922.475324-1-ankur.a.arora@oracle.com/)
> 

For those who invested time in v10, here's the overall v10->v11 diff:


 include/linux/highmem.h |   11 +++---
 include/linux/mm.h      |   13 +++----
 mm/memory.c             |   65 ++++++++++++++++----------------------
 3 files changed, 41 insertions(+), 48 deletions(-)

--- a/include/linux/highmem.h~b
+++ a/include/linux/highmem.h
@@ -205,11 +205,12 @@ static inline void invalidate_kernel_vma
  * @vaddr: the address of the user mapping
  * @page: the page
  *
- * We condition the definition of clear_user_page() on the architecture not
- * having a custom clear_user_highpage(). That's because if there is some
- * special flushing needed for clear_user_highpage() then it is likely that
- * clear_user_page() also needs some magic. And, since our only caller
- * is the generic clear_user_highpage(), not defining is not much of a loss.
+ * We condition the definition of clear_user_page() on the architecture
+ * not having a custom clear_user_highpage(). That's because if there
+ * is some special flushing needed for clear_user_highpage() then it
+ * is likely that clear_user_page() also needs some magic. And, since
+ * our only caller is the generic clear_user_highpage(), not defining
+ * is not much of a loss.
  */
 static inline void clear_user_page(void *addr, unsigned long vaddr, struct page *page)
 {
--- a/include/linux/mm.h~b
+++ a/include/linux/mm.h
@@ -4194,6 +4194,7 @@ static inline void clear_page_guard(stru
 				unsigned int order) {}
 #endif	/* CONFIG_DEBUG_PAGEALLOC */
 
+#ifndef clear_pages
 /**
  * clear_pages() - clear a page range for kernel-internal use.
  * @addr: start address
@@ -4209,12 +4210,10 @@ static inline void clear_page_guard(stru
  * instructions, might not be able to) call cond_resched() to check if
  * rescheduling is required.
  *
- * When running under preemptible models this is fine, since clear_pages(),
- * even when reduced to long-running instructions, is preemptible.
- * Under cooperatively scheduled models, however, the caller is expected to
+ * When running under preemptible models this is not a problem. Under
+ * cooperatively scheduled models, however, the caller is expected to
  * limit @npages to no more than PROCESS_PAGES_NON_PREEMPT_BATCH.
  */
-#ifndef clear_pages
 static inline void clear_pages(void *addr, unsigned int npages)
 {
 	do {
@@ -4233,13 +4232,13 @@ static inline void clear_pages(void *add
  * reasonable preemption latency for when this optimization is not possible
  * (ex. slow microarchitectures, memory bandwidth saturation.)
  *
- * With a value of 8MB and assuming a memory bandwidth of ~10GBps, this should
- * result in worst case preemption latency of around 1ms when clearing pages.
+ * With a value of 32MB and assuming a memory bandwidth of ~10GBps, this should
+ * result in worst case preemption latency of around 3ms when clearing pages.
  *
  * (See comment above clear_pages() for why preemption latency is a concern
  * here.)
  */
-#define PROCESS_PAGES_NON_PREEMPT_BATCH		(8 << (20 - PAGE_SHIFT))
+#define PROCESS_PAGES_NON_PREEMPT_BATCH		(32 << (20 - PAGE_SHIFT))
 #else /* !clear_pages */
 /*
  * The architecture does not provide a clear_pages() implementation. Assume
--- a/mm/memory.c~b
+++ a/mm/memory.c
@@ -7238,10 +7238,11 @@ static inline int process_huge_page(
 }
 
 static void clear_contig_highpages(struct page *page, unsigned long addr,
-				   unsigned int npages)
+				   unsigned int nr_pages)
 {
-	unsigned int i, count, unit;
+	unsigned int i, unit, count;
 
+	might_sleep();
 	/*
 	 * When clearing we want to operate on the largest extent possible since
 	 * that allows for extent based architecture specific optimizations.
@@ -7251,69 +7252,61 @@ static void clear_contig_highpages(struc
 	 * limit the batch size when running under non-preemptible scheduling
 	 * models.
 	 */
-	unit = preempt_model_preemptible() ? npages : PROCESS_PAGES_NON_PREEMPT_BATCH;
+	unit = preempt_model_preemptible() ? nr_pages : PROCESS_PAGES_NON_PREEMPT_BATCH;
 
-	for (i = 0; i < npages; i += count) {
+	for (i = 0; i < nr_pages; i += count) {
 		cond_resched();
 
-		count = min(unit, npages - i);
-		clear_user_highpages(page + i,
-				     addr + i * PAGE_SIZE, count);
+		count = min(unit, nr_pages - i);
+		clear_user_highpages(page + i, addr + i * PAGE_SIZE, count);
 	}
 }
 
+/*
+ * When zeroing a folio, we want to differentiate between pages in the
+ * vicinity of the faulting address where we have spatial and temporal
+ * locality, and those far away where we don't.
+ *
+ * Use a radius of 2 for determining the local neighbourhood.
+ */
+#define FOLIO_ZERO_LOCALITY_RADIUS	2
+
 /**
  * folio_zero_user - Zero a folio which will be mapped to userspace.
  * @folio: The folio to zero.
  * @addr_hint: The address accessed by the user or the base address.
- *
- * Uses architectural support to clear page ranges.
- *
- * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts:
- * pages in the immediate locality of the faulting page, and its left, right
- * regions; the local neighbourhood is cleared last in order to keep cache
- * lines of the faulting region hot.
- *
- * For larger folios we assume that there is no expectation of cache locality
- * and just do a straight zero.
  */
 void folio_zero_user(struct folio *folio, unsigned long addr_hint)
 {
-	unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio));
+	const unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio));
 	const long fault_idx = (addr_hint - base_addr) / PAGE_SIZE;
 	const struct range pg = DEFINE_RANGE(0, folio_nr_pages(folio) - 1);
-	const int width = 2; /* number of pages cleared last on either side */
+	const int radius = FOLIO_ZERO_LOCALITY_RADIUS;
 	struct range r[3];
 	int i;
 
-	if (folio_nr_pages(folio) > MAX_ORDER_NR_PAGES) {
-		clear_contig_highpages(folio_page(folio, 0),
-				       base_addr, folio_nr_pages(folio));
-		return;
-	}
-
 	/*
-	 * Faulting page and its immediate neighbourhood. Cleared at the end to
-	 * ensure it sticks around in the cache.
+	 * Faulting page and its immediate neighbourhood. Will be cleared at the
+	 * end to keep its cachelines hot.
 	 */
-	r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end),
-			    clamp_t(s64, fault_idx + width, pg.start, pg.end));
+	r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - radius, pg.start, pg.end),
+			    clamp_t(s64, fault_idx + radius, pg.start, pg.end));
 
 	/* Region to the left of the fault */
 	r[1] = DEFINE_RANGE(pg.start,
-			    clamp_t(s64, r[2].start-1, pg.start-1, r[2].start));
+			    clamp_t(s64, r[2].start - 1, pg.start - 1, r[2].start));
 
 	/* Region to the right of the fault: always valid for the common fault_idx=0 case. */
-	r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1),
+	r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end + 1, r[2].end, pg.end + 1),
 			    pg.end);
 
-	for (i = 0; i <= 2; i++) {
-		unsigned int npages = range_len(&r[i]);
+	for (i = 0; i < ARRAY_SIZE(r); i++) {
+		const unsigned long addr = base_addr + r[i].start * PAGE_SIZE;
+		const unsigned int nr_pages = range_len(&r[i]);
 		struct page *page = folio_page(folio, r[i].start);
-		unsigned long addr = base_addr + folio_page_idx(folio, page) * PAGE_SIZE;
 
-		if (npages > 0)
-			clear_contig_highpages(page, addr, npages);
+		if (nr_pages > 0)
+			clear_contig_highpages(page, addr, nr_pages);
 	}
 }
 
_