linux-kernel - Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4391e3f5-e0a5-4920-bd50-05337b7764e7@gmail.com>
Date: Fri, 22 Aug 2025 17:50:47 +0400
From: Giorgi Tchankvetadze <giorgitchankvetadze1997@...il.com>
To: lirongqing@...du.com
Cc: akpm@...ux-foundation.org, david@...hat.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org, muchun.song@...ux.dev,
 osalvador@...e.de, xuwenjie04@...du.com
Subject: Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when
 reservation is high

Hi there. The 90% split is solid. Would it make sense to (a) log a 
one-time warning if the second pass is triggered, so operators know why 
boot slowed, and (b) make the 90% cap a Kconfig default ratio, so 
distros can lower it without patching? Both are low-risk and don’t 
change the ABI

Thanks
On 8/22/2025 3:28 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@...du.com>
> 
> When the total reserved hugepages account for 95% or more of system RAM
> (common in cloud computing on physical servers), allocating them all in one
> go can lead to OOM or fail to allocating huge page during early boot.
> 
> The previous hugetlb vmemmap batching change (91f386bf0772) can worsen
> peak memory pressure under these conditions by deferring page frees,
> exacerbating allocation failures. To prevent this, split the allocation
> into two equal batches whenever
> 	huge_reserved_pages >= totalram_pages() * 90 / 100.
> 
> This change does not alter the number of padata worker threads per batch;
> it merely introduces a second round of padata_do_multithreaded(). The added
> overhead of restarting the worker threads is minimal.
> 
> Before:
> [    8.423187] HugeTLB: allocation took 1584ms with hugepage_allocation_threads=48
> [    8.431189] HugeTLB: allocating 385920 of page size 2.00 MiB failed.  Only allocated 385296 hugepages.
> 
> After:
> [    8.740201] HugeTLB: allocation took 1900ms with hugepage_allocation_threads=48
> [    8.748266] HugeTLB: registered 2.00 MiB page size, pre-allocated 385920 pages
> 
> Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
> 
> Co-developed-by: Wenjie Xu <xuwenjie04@...du.com>
> Signed-off-by: Wenjie Xu <xuwenjie04@...du.com>
> Signed-off-by: Li RongQing <lirongqing@...du.com>
> ---
>   mm/hugetlb.c | 21 +++++++++++++++++++--
>   1 filechanged <https://lore.kernel.org/linux-mm/20250822112828.2742-1-lirongqing@baidu.com/#related>, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 753f99b..a86d3a0 100644 
> --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3587,12 +3587,23 @@ static 
> unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)  		.numa_aware	= true
>   	};
>   
> + unsigned long huge_reserved_pages = h->max_huge_pages << h->order; + 
> unsigned long huge_pages, remaining, total_pages;  	unsigned long jiffies_start;
>   	unsigned long jiffies_end;
>   
> + total_pages = totalram_pages() * 90 / 100; + if (huge_reserved_pages > 
> total_pages) { + huge_pages = h->max_huge_pages * 90 / 100; + remaining 
> = h->max_huge_pages - huge_pages; + } else { + huge_pages = h- 
>  >max_huge_pages; + remaining = 0; + } +  	job.thread_fn	= hugetlb_pages_alloc_boot_node;
>   	job.start	= 0;
> - job.size = h->max_huge_pages; + job.size = huge_pages;  
>   	/*
>   	 * job.max_threads is 25% of the available cpu threads by default.
> @@ -3616,10 +3627,16 @@ static unsigned long __init 
> hugetlb_pages_alloc_boot(struct hstate *h)  	}
>   
>   	job.max_threads	= hugepage_allocation_threads;
> - job.min_chunk = h->max_huge_pages / hugepage_allocation_threads; + 
> job.min_chunk = huge_pages / hugepage_allocation_threads;  
>   	jiffies_start = jiffies;
>   	padata_do_multithreaded(&job);
> + if (remaining) { + job.start = huge_pages; + job.size = remaining; + 
> job.min_chunk = remaining / hugepage_allocation_threads; + 
> padata_do_multithreaded(&job); + }  	jiffies_end = jiffies;
>   
>   	pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
> -- 
> 2.9.4
> 
>