[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac0393c7-9c0c-4b4d-8b35-5e6369e5431b@redhat.com>
Date: Thu, 9 Oct 2025 20:51:54 +0200
From: David Hildenbrand <david@...hat.com>
To: Gregory Price <gourry@...rry.net>, Michal Hocko <mhocko@...e.com>
Cc: linux-mm@...ck.org, corbet@....net, muchun.song@...ux.dev,
osalvador@...e.de, akpm@...ux-foundation.org, hannes@...xchg.org,
laoar.shao@...il.com, brauner@...nel.org, mclapinski@...gle.com,
joel.granados@...nel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, Mel Gorman <mgorman@...e.de>,
Alexandru Moise <00moses.alexander00@...il.com>,
Mike Kravetz <mike.kravetz@...cle.com>, David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH] Revert "mm, hugetlb: remove hugepages_treat_as_movable
sysctl"
On 09.10.25 17:29, Gregory Price wrote:
> On Thu, Oct 09, 2025 at 08:14:22AM +0200, Michal Hocko wrote:
>> On Wed 08-10-25 12:31:22, Gregory Price wrote:
>>>> I'm not quite clear yet on the use case, though. If all the user allocations
>>>> end up fragmenting the memory, there is also not a lot of benefit to be had
>>>> from that zone long term.
>>>>
>>>
>>> The only real use case i've seen is exactly:
>>> - Don't want random GFP_KERNEL to land there
>>> - Might want it to be pinnable
>>>
>>> I think that covers what you've described above.
>>>
>>> But adding an entire zone felt a bit heavy handed. Allowing gigantic in
>>> movable seemed less - immediately - offensive.
>>
>> The question is whether we need a full zone for that or we can control
>> those allocation constrains on per memory block bases to override
>> otherwise default. So it wouldn't be MOVABLE but rather something like
>> USER zone.
>
>
> Mild ignorance here - but I don't think the buddy allocator currently
> differentiates chunks of memory based on block membership, it just eats
> folios from certain zones/nodes.
>
> I'm scratching my head trying to think of the discrete mechanism to do
> this that doesn't inject significantly more complexity into the buddy
> allocator.
>
> Looking at the recent[1] THP support for ZONE_DEVICE, I wonder if we end
> up with something more along these lines? But this aschews the other
> requirement of wanting the memory to be otherwise general purpose.
>
> https://lore.kernel.org/linux-mm/20251001065707.920170-1-balbirs@nvidia.com/
>
> ZONE_USER does feel like the most natural solution. Literally just
> (ZONE_NORMAL - GFP_KERNEL). This might need a new GFP flag for certain
> use cases like KVM (GFP_USER) to denote certain "This isn't technically
> kernel memory, but it needs to be pinnable". That would slot right
> between ZONE_NORMAL and ZONE_MOVABLE.
>
> Alternatively we could go the opposite way and introduce ZONE_KERNEL
> below ZONE_NORMAL and disallow GFP_KERNEL from ZONE_NORMAL - then have
> strict watermarks on ZONE_KERNEL to ensure the kernel is always able
> to get memory.
I'm afraid any new zone will be highly controversial and take a long
time to get accepted, if ever :)
The real question is: would we really need a system where we mix e.g.,
ZONE_USER with ZONE_MOVABLE?
Or would it be sufficient to selectively enable (explicit opt-in) some
user pages to end up on ZONE_MOVABLE? IOW, change the semantics of the
zone by an admin.
Like, allowing longterm pinning on ZONE_MOVABLE.
Sure, it would degrade memory hotunplug (until the relevant applications
are shut down) and probably some other things.
Further, I am not so sure about the value of having ZONE_MOVABLE
sprinkled with small unmovable allocations (same concern regarding any
such zone that allows for unmovable things). Kind of against the whole
concept.
But I mean, if the admin decides to do that (opt in), so he is to blame.
--
Cheers
David / dhildenb
Powered by blists - more mailing lists