[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <402170e6-c49f-4d28-a010-eb253fc2f923@redhat.com>
Date: Wed, 8 Oct 2025 10:58:23 +0200
From: David Hildenbrand <david@...hat.com>
To: Gregory Price <gourry@...rry.net>, linux-mm@...ck.org
Cc: corbet@....net, muchun.song@...ux.dev, osalvador@...e.de,
akpm@...ux-foundation.org, hannes@...xchg.org, laoar.shao@...il.com,
brauner@...nel.org, mclapinski@...gle.com, joel.granados@...nel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...e.com>,
Alexandru Moise <00moses.alexander00@...il.com>,
Mike Kravetz <mike.kravetz@...cle.com>, David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH] Revert "mm, hugetlb: remove hugepages_treat_as_movable
sysctl"
On 07.10.25 23:44, Gregory Price wrote:
> This reverts commit d6cb41cc44c63492702281b1d329955ca767d399.
>
> This sysctl provides some flexibility between multiple requirements which
> are difficult to square without adding significantly more complexity.
>
> 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
> 2) onlining memory in ZONE_MOVABLE to prevent GFP_KERNEL usage
> 3) passing NUMA structure through to a virtual machine (node0=vnode0,
> node1=vnode1) so a guest can make good placement decisions.
> 4) utilizing 1GB hugepages for VM host memory to reduce TLB pressure
> 5) Managing device memory after init-time to avoid incidental usage
> at boot (due to being placed in ZONE_NORMAL), or to provide users
> configuration flexibility.
>
> When device-hotplugged memory does not require hot-unplug assurances,
> there is no reason to avoid allowing otherwise non-migratable hugepages
> in this zone. This allows for allocation of 1GB gigantic pages for VMs
> with existing mechanisms.
>
> Boot-time CMA is not possible for driver-managed hotplug memory, as CMA
> requires the memory to be registered as SystemRAM at boot time.
>
> Updated the code to land in appropriate locations since it all moved.
> Updated the documentation to add more context when this is useful.
>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Michal Hocko <mhocko@...e.com>
> Cc: Alexandru Moise <00moses.alexander00@...il.com>
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> Suggested-by: David Rientjes <rientjes@...gle.com>
> Signed-off-by: Gregory Price <gourry@...rry.net>
> Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
> ---
> Documentation/admin-guide/sysctl/vm.rst | 31 +++++++++++++++++++++++++
> include/linux/hugetlb.h | 4 +++-
> mm/hugetlb.c | 9 +++++++
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 4d71211fdad8..c9f26cd447d7 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -40,6 +40,7 @@ Currently, these files are in /proc/sys/vm:
> - enable_soft_offline
> - extfrag_threshold
> - highmem_is_dirtyable
> +- hugepages_treat_as_movable
> - hugetlb_shm_group
> - laptop_mode
> - legacy_va_layout
> @@ -356,6 +357,36 @@ only use the low memory and they can fill it up with dirty data without
> any throttling.
>
>
> +hugepages_treat_as_movable
> +==========================
> +
> +This parameter controls whether otherwise immovable hugepages (e.g. 1GB
> +gigantic pages) may be allocated from from ZONE_MOVABLE. If set to non-zero,
> +gigantic hugepages can be allocated from ZONE_MOVABLE. ZONE_MOVABLE memory
> +may be created via the kernel boot parameter `kernelcore` or via memory
> +hotplug as discussed in Documentation/admin-guide/mm/memory-hotplug.rst.
> +
> +Support may depend on specific architecture and/or the hugepage size. If
> +a hugepage supports migration, allocation from ZONE_MOVABLE is always
> +enabled (for example 2MB on x86) for the hugepage regardless of the value
> +of this parameter. IOW, this parameter affects only non-migratable hugepages.
> +
> +Assuming that hugepages are not migratable in your system, one usecase of
> +this parameter is that users can make hugepage pool more extensible by
> +enabling the allocation from ZONE_MOVABLE. This is because on ZONE_MOVABLE
> +page reclaim/migration/compaction work more and you can get contiguous
> +memory more likely. Note that using ZONE_MOVABLE for non-migratable
> +hugepages can do harm to other features like memory hotremove (because
> +memory hotremove expects that memory blocks on ZONE_MOVABLE are always
> +removable,) so it's a trade-off responsible for the users.
> +
> +One common use-case of this feature is allocate 1GB gigantic pages for
> +virtual machines from otherwise not-hotplugged memory which has been
> +isolated from kernel allocations by being onlined into ZONE_MOVABLE.
> +These pages tend to be allocated and released more explicitly, and so
> +hotplug can still be achieved with appropriate orchestration.
> +
> +
> hugetlb_shm_group
> =================
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 526d27e88b3b..bbaa1b4908b6 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -172,6 +172,7 @@ bool hugetlbfs_pagecache_present(struct hstate *h,
>
> struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio);
>
> +extern int hugepages_treat_as_movable;
> extern int sysctl_hugetlb_shm_group;
> extern struct list_head huge_boot_pages[MAX_NUMNODES];
>
> @@ -926,7 +927,8 @@ static inline gfp_t htlb_alloc_mask(struct hstate *h)
> {
> gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
>
> - gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
> + gfp |= (hugepage_movable_supported(h) || hugepages_treat_as_movable) ?
> + GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
I mean, this is as ugly as it gets.
Can't we just let that old approach RIP where it belongs? :)
If something unmovable, it does not belong on ZONE_MOVABLE, as simple as that.
Something I could sympathize is is treaing gigantic pages that are actually
migratable as movable.
Like
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 526d27e88b3b2..78da85b1308dd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -896,37 +896,12 @@ static inline bool hugepage_migration_supported(struct hstate *h)
return arch_hugetlb_migration_supported(h);
}
-/*
- * Movability check is different as compared to migration check.
- * It determines whether or not a huge page should be placed on
- * movable zone or not. Movability of any huge page should be
- * required only if huge page size is supported for migration.
- * There won't be any reason for the huge page to be movable if
- * it is not migratable to start with. Also the size of the huge
- * page should be large enough to be placed under a movable zone
- * and still feasible enough to be migratable. Just the presence
- * in movable zone does not make the migration feasible.
- *
- * So even though large huge page sizes like the gigantic ones
- * are migratable they should not be movable because its not
- * feasible to migrate them from movable zone.
- */
-static inline bool hugepage_movable_supported(struct hstate *h)
-{
- if (!hugepage_migration_supported(h))
- return false;
-
- if (hstate_is_gigantic(h))
- return false;
- return true;
-}
-
/* Movability of hugepages depends on migration support. */
static inline gfp_t htlb_alloc_mask(struct hstate *h)
{
gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
- gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
+ gfp |= hugepage_migration_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
return gfp;
}
Assume you want to offline part of the ZONE_MOVABLE there might still be sufficient
space to possibly allocate a 1 GiB area elsewhere and actually move the gigantic page.
IIRC, we do the same for memory offlining already.
Now, maybe we want to make the configurable. But then, I would much rather tweak the
hstate_is_gigantic() check in hugepage_movable_supported(). And the parameter
would need a much better name than some "treat as movable".
--
Cheers
David / dhildenb
Powered by blists - more mailing lists