lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3fcbad05-bef2-486a-8d9b-7010a91c85b8@kernel.org>
Date: Fri, 6 Feb 2026 10:36:24 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: Kiryl Shutsemau <kas@...nel.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Muchun Song <muchun.song@...ux.dev>, Matthew Wilcox <willy@...radead.org>,
 Usama Arif <usamaarif642@...il.com>, Frank van der Linden <fvdl@...gle.com>
Cc: Oscar Salvador <osalvador@...e.de>, Mike Rapoport <rppt@...nel.org>,
 Vlastimil Babka <vbabka@...e.cz>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
 Baoquan He <bhe@...hat.com>, Michal Hocko <mhocko@...e.com>,
 Johannes Weiner <hannes@...xchg.org>, Jonathan Corbet <corbet@....net>,
 Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
 Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley
 <paul.walmsley@...ive.com>, Albert Ou <aou@...s.berkeley.edu>,
 Alexandre Ghiti <alex@...ti.fr>, kernel-team@...a.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 loongarch@...ts.linux.dev, linux-riscv@...ts.infradead.org
Subject: Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages

On 2/2/26 16:56, Kiryl Shutsemau wrote:
> HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
> vmemmap pages for huge pages and remapping the freed range to a single
> page containing the struct page metadata.
> 
> With the new mask-based compound_info encoding (for power-of-2 struct
> page sizes), all tail pages of the same order are now identical
> regardless of which compound page they belong to. This means the tail
> pages can be truly shared without fake heads.
> 
> Allocate a single page of initialized tail struct pages per NUMA node
> per order in the vmemmap_tails[] array in pglist_data. All huge pages of
> that order on the node share this tail page, mapped read-only into their
> vmemmap. The head page remains unique per huge page.
> 
> Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
> compile-constant as it is used to specify vmemmap_tail array size.
> For some reason, compiler is not able to solve get_order() at
> compile-time, but ilog2() works.
> 
> Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
> <linux/pgtable.h> which generates hard-to-break include loop.
> 
> This eliminates fake heads while maintaining the same memory savings,
> and simplifies compound_head() by removing fake head detection.
> 
> Signed-off-by: Kiryl Shutsemau <kas@...nel.org>
> ---

[...]

>   #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index a39a301e08b9..688764c52c72 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -19,6 +19,7 @@
>   
>   #include <asm/tlbflush.h>
>   #include "hugetlb_vmemmap.h"
> +#include "internal.h"
>   
>   /**
>    * struct vmemmap_remap_walk - walk vmemmap page table
> @@ -505,6 +506,32 @@ static bool vmemmap_should_optimize_folio(const struct hstate *h, struct folio *
>   	return true;
>   }
>   
> +static struct page *vmemmap_get_tail(unsigned int order, int node)
> +{
> +	struct page *tail, *p;
> +	unsigned int idx;
> +
> +	idx = 

Could do

const unsigned int idx = order - VMEMMAP_TAIL_MIN_ORDER;

above.

> +	tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> +	if (tail)

Wondering if a likely() would be a good idea here. I guess we'll usually 
go through that fast path on a system that has been running for a bit.

> +		return tail;
> +
> +	tail = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
> +	if (!tail)
> +		return NULL;
> +
> +	p = page_to_virt(tail);
> +	for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> +		prep_compound_tail(p + i, NULL, order);

This leaves all pageflags, refcount etc. set to 0, which is mostly 
expected for tail pages.

But, I would have expected something a bit more from 
__init_single_page() that initialized the page properly.

In particular:
* set_page_node(page, node), or how is page_to_nid() handled?
* atomic_set(&page->_mapcount, -1), to not indicate something odd to
   core-mm where we would suddenly have a page mapping for a hugetlb
   folio.

> +
> +	if (cmpxchg(&NODE_DATA(node)->vmemmap_tails[idx], NULL, tail)) {
> +		__free_page(tail);
> +		tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> +	}
> +
> +	return tail;
> +}

[...]

> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -378,16 +378,44 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsigned long end,
>   	}
>   }
>   
> -/*
> - * Populate vmemmap pages HVO-style. The first page contains the head
> - * page and needed tail pages, the other ones are mirrors of the first
> - * page.
> - */
> +static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
> +{
> +	struct page *p, *tail;
> +	unsigned int idx;
> +
> +	BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
> +	BUG_ON(order > MAX_FOLIO_ORDER);
> +
> +	idx = order - VMEMMAP_TAIL_MIN_ORDER;
> +	tail = NODE_DATA(node)->vmemmap_tails[idx];
> +	if (tail)
> +		return page_to_pfn(tail);
> +
> +	p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> +	if (!p)
> +		return 0;
> +
> +	for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> +		prep_compound_tail(p + i, NULL, order);
> +
> +	tail = virt_to_page(p);
> +	NODE_DATA(node)->vmemmap_tails[idx] = tail;
> +
> +	return page_to_pfn(tail);
> +}
> +
>   int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
>   				       int node, unsigned long headsize)
>   {
> +	unsigned long maddr, len, tail_pfn;
> +	unsigned int order;
>   	pte_t *pte;
> -	unsigned long maddr;
> +
> +	len = end - addr;
> +	order = ilog2(len * sizeof(struct page) / PAGE_SIZE);


Could initialize them as const above.

But I am wondering whether it shouldn't be the caller that provides this 
to use? After all, it's all hugetlb code that allocates and prepares that.

Then we could maybe change

#ifdef·CONFIG_SPARSEMEM_VMEMMAP
	struct·page·*vmemmap_tails[NR_VMEMMAP_TAILS];
#endif

to be HVO-only.

-- 
Cheers,

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ