lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516113552.17648-1-xuwenjie04@baidu.com>
Date: Fri, 16 May 2025 19:35:52 +0800
From: Wenjie Xu <xuwenjie04@...du.com>
To: <muchun.song@...ux.dev>, <osalvador@...e.de>, <akpm@...ux-foundation.org>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>, Wenjie Xu
	<xuwenjie04@...du.com>, Li RongQing <lirongqing@...du.com>
Subject: [PATCH] hugetlb: two-phase hugepage allocation when reservation is high

When the total reserved hugepages account for 95% or more of system RAM
(common in cloud computing on physical servers), allocating them all in one
go can starve the rest of the kernel and lead to OOM during early boot.

The previous hugetlb vmemmap batching change (91f386bf0772) can worsen
peak memory pressure under these conditions by deferring page frees,
exacerbating allocation failures. To prevent this, split the allocation
into two equal batches whenever
huge_reserved_pages >= total_base_pages * 95ULL / 100UL.

This change does not alter the number of padata worker threads per batch;
it merely introduces a second round of padata_do_multithreaded(). The added
overhead of restarting the worker threads is minimal.

Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")

Co-developed-by: Li RongQing <lirongqing@...du.com>
Signed-off-by: Li RongQing <lirongqing@...du.com>
Signed-off-by: Wenjie Xu <xuwenjie04@...du.com>
---
 mm/hugetlb.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6ea1be71aa42..7bdcaab6f7ec 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3616,12 +3616,21 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 		.numa_aware	= true
 	};
 
+	unsigned long huge_pages;
+	int i, hugetlb_page_alloc_iter;
+
+	unsigned long total_base_pages = totalram_pages();
+	unsigned long huge_reserved_pages = h->max_huge_pages << h->order;
+
+	hugetlb_page_alloc_iter = (huge_reserved_pages >= total_base_pages * 95ULL / 100UL)
+				 ? 2 : 1;
+
+	huge_pages = h->max_huge_pages / hugetlb_page_alloc_iter;
+
 	unsigned long jiffies_start;
 	unsigned long jiffies_end;
 
 	job.thread_fn	= hugetlb_pages_alloc_boot_node;
-	job.start	= 0;
-	job.size	= h->max_huge_pages;
 
 	/*
 	 * job.max_threads is 25% of the available cpu threads by default.
@@ -3645,10 +3654,17 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 	}
 
 	job.max_threads	= hugepage_allocation_threads;
-	job.min_chunk	= h->max_huge_pages / hugepage_allocation_threads;
+	job.min_chunk	= huge_pages / hugepage_allocation_threads;
 
 	jiffies_start = jiffies;
-	padata_do_multithreaded(&job);
+	for (i = 0; i < hugetlb_page_alloc_iter; i++) {
+		job.start = huge_pages * i;
+		job.size = (i + 1 == hugetlb_page_alloc_iter)
+			 ? h->max_huge_pages - huge_pages * i
+			 : huge_pages;
+		padata_do_multithreaded(&job);
+	}
+
 	jiffies_end = jiffies;
 
 	pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
-- 
2.41.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ