lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250905141137.3529867-1-david@redhat.com>
Date: Fri,  5 Sep 2025 16:11:37 +0200
From: David Hildenbrand <david@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org,
	David Hildenbrand <david@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Zi Yan <ziy@...dia.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Nico Pache <npache@...hat.com>,
	Ryan Roberts <ryan.roberts@....com>,
	Dev Jain <dev.jain@....com>,
	Barry Song <baohua@...nel.org>,
	Usama Arif <usamaarif642@...il.com>
Subject: [PATCH v1] mm/huge_memory: fix shrinking of all-zero THPs with max_ptes_none default

We added an early exit in thp_underused(), probably to avoid scanning
pages when there is no chance for success.

However, assume we have max_ptes_none = 511 (default).

Nothing should stop us from freeing all pages part of a THP that
is completely zero (512) and khugepaged will for sure not try to
instantiate a THP in that case (512 shared zeropages).

This can just trivially happen if someone writes a single 0 byte into a
PMD area, or of course, when data ends up being zero later.

So let's remove that early exit.

Do we want to CC stable? Hm, not sure. Probably not urgent.

Note that, as default, the THP shrinker is active
(/sys/kernel/mm/transparent_hugepage/shrink_underused = 1), and all
THPs are added to the deferred split lists. However, with the
max_ptes_none default we would never scan them. We would not do that. If
that's not desirable, we should just disable the shrinker as default,
also not adding all THPs to the deferred split lists.

Easy to reproduce:

1) Allocate some THPs filled with 0s

<prog.c>
 #include <string.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <sys/mman.h>

 const size_t size = 1024*1024*1024;

 int main(void)
 {
         size_t offs;
         char *area;

         area = mmap(0, size, PROT_READ | PROT_WRITE,
                     MAP_ANON | MAP_PRIVATE, -1, 0);
         if (area == MAP_FAILED) {
                 printf("mmap failed\n");
                 exit(-1);
         }
         madvise(area, size, MADV_HUGEPAGE);

         for (offs = 0; offs < size; offs += getpagesize())
                 area[offs] = 0;
         pause();
 }
<\prog.c>

2) Trigger the shrinker

E.g., memory pressure through memhog

3) Observe that THPs are not getting reclaimed

$ cat /proc/`pgrep prog`/smaps_rollup

Would list ~1GiB of AnonHugePages. With this fix, they would get
reclaimed as expected.

Fixes: dafff3f4c850 ("mm: split underused THPs")
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Zi Yan <ziy@...dia.com>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
Cc: Nico Pache <npache@...hat.com>
Cc: Ryan Roberts <ryan.roberts@....com>
Cc: Dev Jain <dev.jain@....com>
Cc: Barry Song <baohua@...nel.org>
Cc: Usama Arif <usamaarif642@...il.com>
Signed-off-by: David Hildenbrand <david@...hat.com>
---
 mm/huge_memory.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 26cedfcd74189..aa3ed7a86435b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4110,9 +4110,6 @@ static bool thp_underused(struct folio *folio)
 	void *kaddr;
 	int i;
 
-	if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
-		return false;
-
 	for (i = 0; i < folio_nr_pages(folio); i++) {
 		kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
 		if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
-- 
2.50.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ