linux-kernel - Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5twlonzi3rooao7gyp5g4tyaeevemcx6qhuf4xvdtsi2cykuo4@wrhxmxz63wvn>
Date: Thu, 11 Dec 2025 15:08:13 +0000
From: Kiryl Shutsemau <kas@...nel.org>
To: Muchun Song <muchun.song@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	David Hildenbrand <david@...nel.org>, Oscar Salvador <osalvador@...e.de>, 
	Mike Rapoport <rppt@...nel.org>, Vlastimil Babka <vbabka@...e.cz>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Matthew Wilcox <willy@...radead.org>, Zi Yan <ziy@...dia.com>, 
	Baoquan He <bhe@...hat.com>, Michal Hocko <mhocko@...e.com>, 
	Johannes Weiner <hannes@...xchg.org>, Jonathan Corbet <corbet@....net>, 
	Usama Arif <usamaarif642@...il.com>, kernel-team@...a.com, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap
 optimization

On Wed, Dec 10, 2025 at 11:39:24AM +0800, Muchun Song wrote:
> 
> 
> > On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@...nel.org> wrote:
> > 
> > On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
> >> The prerequisite is that the starting address of vmemmap must be aligned to
> >> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
> >> somewhere to guarantee this (not compile time but at runtime like for KASLR).
> > 
> > I have hard time finding the right spot to put the check.
> > 
> > I considered something like the patch below, but it is probably too late
> > if we boot preallocating huge pages.
> > 
> > I will dig more later, but if you have any suggestions, I would
> > appreciate.
> 
> If you opt to record the mask information, then even when HVO is
> disabled compound_head will still compute the head-page address
> by means of the mask. Consequently this constraint must hold for
> **every** compound page.  
> 
> Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
> that file only turns HVO off, yet the calculation remains broken
> for all other large compound pages.
> 
> From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
> at most 16 GB of physically contiguous memory. We must therefore
> guarantee that the vmemmap area starts on an address aligned to at
> least 256 MB.
> 
> When KASLR is disabled the vmemmap base is normally fixed by a
> macro, so the check can be done at compile time; when KASLR is enabled
> we have to ensure that the randomly chosen offset is a multiple
> of 256 MB. These two spots are, in my view, the places that need
> to be changed.
> 
> Moreover, this approach requires the virtual addresses of struct
> page (possibly spanning sections) to be contiguous, so the method is
> valid **only** under CONFIG_SPARSEMEM_VMEMMAP.
> 
> Also, when I skimmed through the overall patch yesterday, one detail
> caught my eye: the shared tail page is **not** "per hstate"; it is
> "per hstate, per zone, per node", because the zone and node
> information is encoded in the tail page’s flags field. We should make
> sure both page_to_nid() and page_zone() work properly.

Right. Or we can slap compound_head() inside them. 

I stepped onto VM_BUG_ON_PAGE() in get_pfnblock_bitmap_bitidx().
Workarounded with compound_head() for now.

I am not sure if we want to allocate them per-zone. Seems excessive.
But per-node is reasonable.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov