[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3fcbad05-bef2-486a-8d9b-7010a91c85b8@kernel.org>
Date: Fri, 6 Feb 2026 10:36:24 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: Kiryl Shutsemau <kas@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>, Matthew Wilcox <willy@...radead.org>,
Usama Arif <usamaarif642@...il.com>, Frank van der Linden <fvdl@...gle.com>
Cc: Oscar Salvador <osalvador@...e.de>, Mike Rapoport <rppt@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
Baoquan He <bhe@...hat.com>, Michal Hocko <mhocko@...e.com>,
Johannes Weiner <hannes@...xchg.org>, Jonathan Corbet <corbet@....net>,
Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley
<paul.walmsley@...ive.com>, Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alex@...ti.fr>, kernel-team@...a.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
loongarch@...ts.linux.dev, linux-riscv@...ts.infradead.org
Subject: Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
> vmemmap pages for huge pages and remapping the freed range to a single
> page containing the struct page metadata.
>
> With the new mask-based compound_info encoding (for power-of-2 struct
> page sizes), all tail pages of the same order are now identical
> regardless of which compound page they belong to. This means the tail
> pages can be truly shared without fake heads.
>
> Allocate a single page of initialized tail struct pages per NUMA node
> per order in the vmemmap_tails[] array in pglist_data. All huge pages of
> that order on the node share this tail page, mapped read-only into their
> vmemmap. The head page remains unique per huge page.
>
> Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
> compile-constant as it is used to specify vmemmap_tail array size.
> For some reason, compiler is not able to solve get_order() at
> compile-time, but ilog2() works.
>
> Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
> <linux/pgtable.h> which generates hard-to-break include loop.
>
> This eliminates fake heads while maintaining the same memory savings,
> and simplifies compound_head() by removing fake head detection.
>
> Signed-off-by: Kiryl Shutsemau <kas@...nel.org>
> ---
[...]
> #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index a39a301e08b9..688764c52c72 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -19,6 +19,7 @@
>
> #include <asm/tlbflush.h>
> #include "hugetlb_vmemmap.h"
> +#include "internal.h"
>
> /**
> * struct vmemmap_remap_walk - walk vmemmap page table
> @@ -505,6 +506,32 @@ static bool vmemmap_should_optimize_folio(const struct hstate *h, struct folio *
> return true;
> }
>
> +static struct page *vmemmap_get_tail(unsigned int order, int node)
> +{
> + struct page *tail, *p;
> + unsigned int idx;
> +
> + idx =
Could do
const unsigned int idx = order - VMEMMAP_TAIL_MIN_ORDER;
above.
> + tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> + if (tail)
Wondering if a likely() would be a good idea here. I guess we'll usually
go through that fast path on a system that has been running for a bit.
> + return tail;
> +
> + tail = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
> + if (!tail)
> + return NULL;
> +
> + p = page_to_virt(tail);
> + for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> + prep_compound_tail(p + i, NULL, order);
This leaves all pageflags, refcount etc. set to 0, which is mostly
expected for tail pages.
But, I would have expected something a bit more from
__init_single_page() that initialized the page properly.
In particular:
* set_page_node(page, node), or how is page_to_nid() handled?
* atomic_set(&page->_mapcount, -1), to not indicate something odd to
core-mm where we would suddenly have a page mapping for a hugetlb
folio.
> +
> + if (cmpxchg(&NODE_DATA(node)->vmemmap_tails[idx], NULL, tail)) {
> + __free_page(tail);
> + tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> + }
> +
> + return tail;
> +}
[...]
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -378,16 +378,44 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsigned long end,
> }
> }
>
> -/*
> - * Populate vmemmap pages HVO-style. The first page contains the head
> - * page and needed tail pages, the other ones are mirrors of the first
> - * page.
> - */
> +static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
> +{
> + struct page *p, *tail;
> + unsigned int idx;
> +
> + BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
> + BUG_ON(order > MAX_FOLIO_ORDER);
> +
> + idx = order - VMEMMAP_TAIL_MIN_ORDER;
> + tail = NODE_DATA(node)->vmemmap_tails[idx];
> + if (tail)
> + return page_to_pfn(tail);
> +
> + p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> + if (!p)
> + return 0;
> +
> + for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> + prep_compound_tail(p + i, NULL, order);
> +
> + tail = virt_to_page(p);
> + NODE_DATA(node)->vmemmap_tails[idx] = tail;
> +
> + return page_to_pfn(tail);
> +}
> +
> int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
> int node, unsigned long headsize)
> {
> + unsigned long maddr, len, tail_pfn;
> + unsigned int order;
> pte_t *pte;
> - unsigned long maddr;
> +
> + len = end - addr;
> + order = ilog2(len * sizeof(struct page) / PAGE_SIZE);
Could initialize them as const above.
But I am wondering whether it shouldn't be the caller that provides this
to use? After all, it's all hugetlb code that allocates and prepares that.
Then we could maybe change
#ifdef·CONFIG_SPARSEMEM_VMEMMAP
struct·page·*vmemmap_tails[NR_VMEMMAP_TAILS];
#endif
to be HVO-only.
--
Cheers,
David
Powered by blists - more mailing lists