linux-kernel - Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <BAF36B4D-0047-48C4-9CB8-C8566722A79B@linux.dev>
Date: Thu, 11 Dec 2025 11:45:13 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Kiryl Shutsemau <kas@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...nel.org>,
 Oscar Salvador <osalvador@...e.de>,
 Mike Rapoport <rppt@...nel.org>,
 Vlastimil Babka <vbabka@...e.cz>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Matthew Wilcox <willy@...radead.org>,
 Zi Yan <ziy@...dia.com>,
 Baoquan He <bhe@...hat.com>,
 Michal Hocko <mhocko@...e.com>,
 Johannes Weiner <hannes@...xchg.org>,
 Jonathan Corbet <corbet@....net>,
 Usama Arif <usamaarif642@...il.com>,
 kernel-team@...a.com,
 linux-mm@...ck.org,
 linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap
 optimization



> On Dec 10, 2025, at 11:39, Muchun Song <muchun.song@...ux.dev> wrote:
> 
>> On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@...nel.org> wrote:
>> 
>> On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
>>> The prerequisite is that the starting address of vmemmap must be aligned to
>>> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
>>> somewhere to guarantee this (not compile time but at runtime like for KASLR).
>> 
>> I have hard time finding the right spot to put the check.
>> 
>> I considered something like the patch below, but it is probably too late
>> if we boot preallocating huge pages.
>> 
>> I will dig more later, but if you have any suggestions, I would
>> appreciate.
> 
> If you opt to record the mask information, then even when HVO is
> disabled compound_head will still compute the head-page address
> by means of the mask. Consequently this constraint must hold for
> **every** compound page.  
> 
> Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
> that file only turns HVO off, yet the calculation remains broken
> for all other large compound pages.
> 
> From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
> at most 16 GB of physically contiguous memory. We must therefore
> guarantee that the vmemmap area starts on an address aligned to at
> least 256 MB.
> 
> When KASLR is disabled the vmemmap base is normally fixed by a
> macro, so the check can be done at compile time; when KASLR is enabled
> we have to ensure that the randomly chosen offset is a multiple
> of 256 MB. These two spots are, in my view, the places that need
> to be changed.
> 
> Moreover, this approach requires the virtual addresses of struct
> page (possibly spanning sections) to be contiguous, so the method is
> valid **only** under CONFIG_SPARSEMEM_VMEMMAP.

This is no longer an issue, because with nth_page removed (I only
just found out), a folio can no longer span multiple sections even
when !CONFIG_SPARSEMEM_VMEMMAP.

> 
> Also, when I skimmed through the overall patch yesterday, one detail
> caught my eye: the shared tail page is **not** "per hstate"; it is
> "per hstate, per zone, per node", because the zone and node
> information is encoded in the tail page’s flags field. We should make
> sure both page_to_nid() and page_zone() work properly.
> 
> Muchun,
> Thanks.
> 
>> 
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 04a211a146a0..971558184587 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -886,6 +886,14 @@ static int __init hugetlb_vmemmap_init(void)
>> BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES);
>> 
>> 	for_each_hstate(h) {
>> +  		unsigned long size = huge_page_size(h) / sizeof(struct page);
>> +
>> +  		/* vmemmap is expected to be naturally aligned to page size */
>> +  		if (WARN_ON_ONCE(!IS_ALIGNED((unsigned long)vmemmap, size))) {
>> +  			vmemmap_optimize_enabled = false;
>> +  			continue;
>> +  		}
>> +
>> 		if (hugetlb_vmemmap_optimizable(h)) {
>> 			register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
>> 			break;
>> -- 
>> Kiryl Shutsemau / Kirill A. Shutemov