[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <BAF36B4D-0047-48C4-9CB8-C8566722A79B@linux.dev>
Date: Thu, 11 Dec 2025 11:45:13 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Kiryl Shutsemau <kas@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...nel.org>,
Oscar Salvador <osalvador@...e.de>,
Mike Rapoport <rppt@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Matthew Wilcox <willy@...radead.org>,
Zi Yan <ziy@...dia.com>,
Baoquan He <bhe@...hat.com>,
Michal Hocko <mhocko@...e.com>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Usama Arif <usamaarif642@...il.com>,
kernel-team@...a.com,
linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap
optimization
> On Dec 10, 2025, at 11:39, Muchun Song <muchun.song@...ux.dev> wrote:
>
>> On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@...nel.org> wrote:
>>
>> On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
>>> The prerequisite is that the starting address of vmemmap must be aligned to
>>> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
>>> somewhere to guarantee this (not compile time but at runtime like for KASLR).
>>
>> I have hard time finding the right spot to put the check.
>>
>> I considered something like the patch below, but it is probably too late
>> if we boot preallocating huge pages.
>>
>> I will dig more later, but if you have any suggestions, I would
>> appreciate.
>
> If you opt to record the mask information, then even when HVO is
> disabled compound_head will still compute the head-page address
> by means of the mask. Consequently this constraint must hold for
> **every** compound page.
>
> Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
> that file only turns HVO off, yet the calculation remains broken
> for all other large compound pages.
>
> From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
> at most 16 GB of physically contiguous memory. We must therefore
> guarantee that the vmemmap area starts on an address aligned to at
> least 256 MB.
>
> When KASLR is disabled the vmemmap base is normally fixed by a
> macro, so the check can be done at compile time; when KASLR is enabled
> we have to ensure that the randomly chosen offset is a multiple
> of 256 MB. These two spots are, in my view, the places that need
> to be changed.
>
> Moreover, this approach requires the virtual addresses of struct
> page (possibly spanning sections) to be contiguous, so the method is
> valid **only** under CONFIG_SPARSEMEM_VMEMMAP.
This is no longer an issue, because with nth_page removed (I only
just found out), a folio can no longer span multiple sections even
when !CONFIG_SPARSEMEM_VMEMMAP.
>
> Also, when I skimmed through the overall patch yesterday, one detail
> caught my eye: the shared tail page is **not** "per hstate"; it is
> "per hstate, per zone, per node", because the zone and node
> information is encoded in the tail page’s flags field. We should make
> sure both page_to_nid() and page_zone() work properly.
>
> Muchun,
> Thanks.
>
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 04a211a146a0..971558184587 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -886,6 +886,14 @@ static int __init hugetlb_vmemmap_init(void)
>> BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES);
>>
>> for_each_hstate(h) {
>> + unsigned long size = huge_page_size(h) / sizeof(struct page);
>> +
>> + /* vmemmap is expected to be naturally aligned to page size */
>> + if (WARN_ON_ONCE(!IS_ALIGNED((unsigned long)vmemmap, size))) {
>> + vmemmap_optimize_enabled = false;
>> + continue;
>> + }
>> +
>> if (hugetlb_vmemmap_optimizable(h)) {
>> register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
>> break;
>> --
>> Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists