linux-kernel - [PATCH v5 13/14] mm: memory: support clearing page-extents

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250710005926.1159009-14-ankur.a.arora@oracle.com>
Date: Wed,  9 Jul 2025 17:59:25 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Cc: akpm@...ux-foundation.org, david@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
        acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
        willy@...radead.org, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        ankur.a.arora@...cle.com
Subject: [PATCH v5 13/14] mm: memory: support clearing page-extents

folio_zero_user() is constrained to clear in a page-at-a-time
fashion because it supports CONFIG_HIGHMEM which means that kernel
mappings for pages in a folio are not guaranteed to be contiguous.

We don't have this problem when running under configurations with
CONFIG_CLEAR_PAGE_EXTENT (implies !CONFIG_HIGHMEM), so zero in 
longer page-extents.
This is expected to be faster because the processor can now optimize
the clearing based on the knowledge of the extent.

However, clearing in larger chunks can have two other problems:

 - cache locality when clearing small folios (< MAX_ORDER_NR_PAGES)
   (larger folios don't have any expectation of cache locality).

 - preemption latency when clearing large folios.

Handle the first by splitting the clearing in three parts: the
faulting page and its immediate locality, its left and right
regions; the local neighbourhood is cleared last.

The second problem is relevant only when running under cooperative
preemption models. Limit the worst case preemption latency by clearing
in architecture specified ARCH_CLEAR_PAGE_EXTENT units.

Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
---
 mm/memory.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index b0cda5aab398..c52806270375 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7034,6 +7034,7 @@ static inline int process_huge_page(
 	return 0;
 }
 
+#ifndef CONFIG_CLEAR_PAGE_EXTENT
 static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint,
 				unsigned int nr_pages)
 {
@@ -7058,7 +7059,10 @@ static int clear_subpage(unsigned long addr, int idx, void *arg)
 /**
  * folio_zero_user - Zero a folio which will be mapped to userspace.
  * @folio: The folio to zero.
- * @addr_hint: The address will be accessed or the base address if uncelar.
+ * @addr_hint: The address accessed by the user or the base address.
+ *
+ * folio_zero_user() uses clear_gigantic_page() or process_huge_page() to
+ * do page-at-a-time zeroing because it needs to handle CONFIG_HIGHMEM.
  */
 void folio_zero_user(struct folio *folio, unsigned long addr_hint)
 {
@@ -7070,6 +7074,86 @@ void folio_zero_user(struct folio *folio, unsigned long addr_hint)
 		process_huge_page(addr_hint, nr_pages, clear_subpage, folio);
 }
 
+#else /* CONFIG_CLEAR_PAGE_EXTENT */
+
+static void clear_pages_resched(void *addr, int npages)
+{
+	int i, remaining;
+
+	if (preempt_model_preemptible()) {
+		clear_pages(addr, npages);
+		goto out;
+	}
+
+	for (i = 0; i < npages/ARCH_CLEAR_PAGE_EXTENT; i++) {
+		clear_pages(addr + i * ARCH_CLEAR_PAGE_EXTENT * PAGE_SIZE,
+			    ARCH_CLEAR_PAGE_EXTENT);
+		cond_resched();
+	}
+
+	remaining = npages % ARCH_CLEAR_PAGE_EXTENT;
+
+	if (remaining)
+		clear_pages(addr + i * ARCH_CLEAR_PAGE_EXTENT * PAGE_SHIFT,
+			    remaining);
+out:
+	cond_resched();
+}
+
+/*
+ * folio_zero_user - Zero a folio which will be mapped to userspace.
+ * @folio: The folio to zero.
+ * @addr_hint: The address accessed by the user or the base address.
+ *
+ * Uses architectural support for clear_pages() to zero page extents
+ * instead of clearing page-at-a-time.
+ *
+ * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts:
+ * pages in the immediate locality of the faulting page, and its left, right
+ * regions; the local neighbourhood cleared last in order to keep cache
+ * lines of the target region hot.
+ *
+ * For larger folios we assume that there is no expectation of cache locality
+ * and just do a straight zero.
+ */
+void folio_zero_user(struct folio *folio, unsigned long addr_hint)
+{
+	unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio));
+	const long fault_idx = (addr_hint - base_addr) / PAGE_SIZE;
+	const struct range pg = DEFINE_RANGE(0, folio_nr_pages(folio) - 1);
+	const int width = 2; /* number of pages cleared last on either side */
+	struct range r[3];
+	int i;
+
+	if (folio_nr_pages(folio) > MAX_ORDER_NR_PAGES) {
+		clear_pages_resched(page_address(folio_page(folio, 0)), folio_nr_pages(folio));
+		return;
+	}
+
+	/*
+	 * Faulting page and its immediate neighbourhood. Cleared at the end to
+	 * ensure it sticks around in the cache.
+	 */
+	r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end),
+			    clamp_t(s64, fault_idx + width, pg.start, pg.end));
+
+	/* Region to the left of the fault */
+	r[1] = DEFINE_RANGE(pg.start,
+			    clamp_t(s64, r[2].start-1, pg.start-1, r[2].start));
+
+	/* Region to the right of the fault: always valid for the common fault_idx=0 case. */
+	r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1),
+			    pg.end);
+
+	for (i = 0; i <= 2; i++) {
+		int npages = range_len(&r[i]);
+
+		if (npages > 0)
+			clear_pages_resched(page_address(folio_page(folio, r[i].start)), npages);
+	}
+}
+#endif /* CONFIG_CLEAR_PAGE_EXTENT */
+
 static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
 				   unsigned long addr_hint,
 				   struct vm_area_struct *vma,
-- 
2.43.5