linux-kernel - Re: [RFC 00/12] mm: PUD (1GB) THP implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5de4d58d-649f-4eb3-81dd-8313d17a9725@kernel.org>
Date: Thu, 5 Feb 2026 12:22:15 +0100
From: "David Hildenbrand (arm)" <david@...nel.org>
To: Zi Yan <ziy@...dia.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Rik van Riel <riel@...riel.com>, Usama Arif <usamaarif642@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
 hannes@...xchg.org, shakeel.butt@...ux.dev, kas@...nel.org,
 baohua@...nel.org, dev.jain@....com, baolin.wang@...ux.alibaba.com,
 npache@...hat.com, Liam.Howlett@...cle.com, ryan.roberts@....com,
 vbabka@...e.cz, lance.yang@...ux.dev, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, Frank van der Linden <fvdl@...gle.com>
Subject: Re: [RFC 00/12] mm: PUD (1GB) THP implementation

On 2/2/26 16:50, Zi Yan wrote:
> On 2 Feb 2026, at 6:30, Lorenzo Stoakes wrote:
> 
>> On Sun, Feb 01, 2026 at 09:44:12PM -0500, Rik van Riel wrote:
>>> To address the obvious objection "but how could we
>>> possibly allocate 1GB huge pages while the workload
>>> is running?", I am planning to pick up the CMA balancing
>>> patch series (thank you, Frank) and get that in an
>>> upstream ready shape soon.
>>>
>>> https://lkml.org/2025/9/15/1735
>>
>> That link doesn't work?
>>
>> Did a quick search for CMA balancing on lore, couldn't find anything, could you
>> provide a lore link?
> 
> https://lwn.net/Articles/1038263/
> 
>>
>>>
>>> That patch set looks like another case where no
>>> amount of internal testing will find every single
>>> corner case, and we'll probably just want to
>>> merge it upstream, deploy it experimentally, and
>>> aggressively deal with anything that might pop up.
>>
>> I'm not really in favour of this kind of approach. There's plenty of things that
>> were considered 'temporary' upstream that became rather permanent :)
>>
>> Maybe we can't cover all corner-cases, but we need to make sure whatever we do
>> send upstream is maintainable, conceptually sensible and doesn't paint us into
>> any corners, etc.
>>
>>>
>>> With CMA balancing, it would be possibly to just
>>> have half (or even more) of system memory for
>>> movable allocations only, which would make it possible
>>> to allocate 1GB huge pages dynamically.
>>
>> Could you expand on that?
> 
> I also would like to hear David’s opinion on using CMA for 1GB THP.
> He did not like it[1] when I posted my patch back in 2020, but it has
> been more than 5 years. :)

Hehe, not particularly excited about that.

We really have to avoid short-term hacks by any means. We have enough of 
that in THP land already.

We talked about challenges in the past like:
* Controlling who gets to allocate them.
* Having a reasonable swap/migration mechanism
* Reliably allocating them without hacks, while being future-proof
* Long-term pinning them when they are actually on ZONE_MOVABLE or CMA
   (the latter could be made working but requires thought)

I agree with Lorenzo that this RFC is a bit surprising, because I assume 
none of the real challenges were tackled.

Having that said, it will take me some time to come back to this RFC 
here, other stuff that piled up is more urgent and more important.

But I'll note that we really have to cleanup the THP mess before we add 
more stuff on it.

For example, I still wonder whether we can just stop pre-allocating page 
tables for THPs and instead let code fail+retry in case we cannot remap 
the page. I wanted to look into the details a long time ago but never 
got to it.

Avoiding that would make the remapping much easier; and we should then 
remap from PUD->PMD->PTEs.

Implementing 1 GiB support for shmem might be a reasonable first step, 
before we start digging into the anonymous memory land with all these 
nasty things involved.

-- 
Cheers,

David