linux-kernel - Re: [PATCH] mm/hugetlb: use separate nodemask for bootmem allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z_Uqiu75bXhqpwm4@localhost.localdomain>
Date: Tue, 8 Apr 2025 15:54:18 +0200
From: Oscar Salvador <osalvador@...e.de>
To: Frank van der Linden <fvdl@...gle.com>
Cc: akpm@...ux-foundation.org, muchun.song@...ux.dev, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, david@...hat.com, luizcap@...hat.com
Subject: Re: [PATCH] mm/hugetlb: use separate nodemask for bootmem allocations

On Wed, Apr 02, 2025 at 08:56:13PM +0000, Frank van der Linden wrote:
> Hugetlb boot allocation has used online nodes for allocation since
> commit de55996d7188 ("mm/hugetlb: use online nodes for bootmem
> allocation"). This was needed to be able to do the allocations
> earlier in boot, before N_MEMORY was set.
> 
> This might lead to a different distribution of gigantic hugepages
> across NUMA nodes if there are memoryless nodes in the system.
> 
> What happens is that the memoryless nodes are tried, but then
> the memblock allocation fails and falls back, which usually means
> that the node that has the highest physical address available
> will be used (top-down allocation). While this will end up
> getting the same number of hugetlb pages, they might not be
> be distributed the same way. The fallback for each memoryless
> node might not end up coming from the same node as the
> successful round-robin allocation from N_MEMORY nodes.
> 
> While administrators that rely on having a specific number of
> hugepages per node should use the hugepages=N:X syntax, it's
> better not to change the old behavior for the plain hugepages=N
> case.
> 
> To do this, construct a nodemask for hugetlb bootmem purposes
> only, containing nodes that have memory. Then use that
> for round-robin bootmem allocations.
> 
> This saves some cycles, and the added advantage here is that
> hugetlb_cma can use it too, avoiding the older issue of
> pointless attempts to create a CMA area for memoryless nodes
> (which will also cause the per-node CMA area size to be too
> small).

Hi Frank,

Makes sense.

There something I do not quite understand though

> @@ -5012,7 +5039,6 @@ void __init hugetlb_bootmem_alloc(void)
>  
>  	for_each_hstate(h) {
>  		h->next_nid_to_alloc = first_online_node;
> -		h->next_nid_to_free = first_online_node;

Why are you unsetting next_nid_to_free? I guess it is because
we do not use it during boot time and you already set it to
first_memory_node further down the road in hugetlb_init_hstates.

And the reason you are leaving next_nid_to_alloc set is to see if
there is any chance that first_online_node is part of hugetlb_bootmem_nodes?


-- 
Oscar Salvador
SUSE Labs