[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5twlonzi3rooao7gyp5g4tyaeevemcx6qhuf4xvdtsi2cykuo4@wrhxmxz63wvn>
Date: Thu, 11 Dec 2025 15:08:13 +0000
From: Kiryl Shutsemau <kas@...nel.org>
To: Muchun Song <muchun.song@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...nel.org>, Oscar Salvador <osalvador@...e.de>,
Mike Rapoport <rppt@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Matthew Wilcox <willy@...radead.org>, Zi Yan <ziy@...dia.com>,
Baoquan He <bhe@...hat.com>, Michal Hocko <mhocko@...e.com>,
Johannes Weiner <hannes@...xchg.org>, Jonathan Corbet <corbet@....net>,
Usama Arif <usamaarif642@...il.com>, kernel-team@...a.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap
optimization
On Wed, Dec 10, 2025 at 11:39:24AM +0800, Muchun Song wrote:
>
>
> > On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@...nel.org> wrote:
> >
> > On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
> >> The prerequisite is that the starting address of vmemmap must be aligned to
> >> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
> >> somewhere to guarantee this (not compile time but at runtime like for KASLR).
> >
> > I have hard time finding the right spot to put the check.
> >
> > I considered something like the patch below, but it is probably too late
> > if we boot preallocating huge pages.
> >
> > I will dig more later, but if you have any suggestions, I would
> > appreciate.
>
> If you opt to record the mask information, then even when HVO is
> disabled compound_head will still compute the head-page address
> by means of the mask. Consequently this constraint must hold for
> **every** compound page.
>
> Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
> that file only turns HVO off, yet the calculation remains broken
> for all other large compound pages.
>
> From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
> at most 16 GB of physically contiguous memory. We must therefore
> guarantee that the vmemmap area starts on an address aligned to at
> least 256 MB.
>
> When KASLR is disabled the vmemmap base is normally fixed by a
> macro, so the check can be done at compile time; when KASLR is enabled
> we have to ensure that the randomly chosen offset is a multiple
> of 256 MB. These two spots are, in my view, the places that need
> to be changed.
>
> Moreover, this approach requires the virtual addresses of struct
> page (possibly spanning sections) to be contiguous, so the method is
> valid **only** under CONFIG_SPARSEMEM_VMEMMAP.
>
> Also, when I skimmed through the overall patch yesterday, one detail
> caught my eye: the shared tail page is **not** "per hstate"; it is
> "per hstate, per zone, per node", because the zone and node
> information is encoded in the tail page’s flags field. We should make
> sure both page_to_nid() and page_zone() work properly.
Right. Or we can slap compound_head() inside them.
I stepped onto VM_BUG_ON_PAGE() in get_pfnblock_bitmap_bitidx().
Workarounded with compound_head() for now.
I am not sure if we want to allocate them per-zone. Seems excessive.
But per-node is reasonable.
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists