[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cb55d76-4da7-4ebe-b23b-0abbc4d963f3@kernel.org>
Date: Mon, 9 Feb 2026 13:44:45 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: Mike Rapoport <rppt@...nel.org>
Cc: Tianyou Li <tianyou.li@...el.com>, Oscar Salvador <osalvador@...e.de>,
Wei Yang <richard.weiyang@...il.com>, Michal Hocko <mhocko@...e.com>,
linux-mm@...ck.org, Yong Hu <yong.hu@...el.com>,
Nanhai Zou <nanhai.zou@...el.com>, Yuan Liu <yuan1.liu@...el.com>,
Tim Chen <tim.c.chen@...ux.intel.com>, Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
Yu C Chen <yu.c.chen@...el.com>, Pan Deng <pan.deng@...el.com>,
Chen Zhang <zhangchen.kidd@...com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize
zone->contiguous update when changes pfn range
On 2/9/26 11:52, David Hildenbrand (Arm) wrote:
> On 2/8/26 20:39, Mike Rapoport wrote:
>> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>>>
>>> Thanks for all your work on this and sorry for being slower with
>>> review the last month.
>>>
>>> While I was in the shower I was thinking about how much I hate
>>> zone->contiguous + the pageblock walking, and how we could just get
>>> rid of it.
>>>
>>> You know, just what you do while having a relaxing shower.
>>>
>>>
>>> And I was wondering:
>>>
>>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>>> and the zone *not* being contiguous? I assume this just cannot happen,
>>> otherwise BUG.
>>>
>>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>>> and the zone *being* contiguous? I assume in some cases where we have
>>> small
>>> holes within a pageblock?
>>>
>>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>>> scenarios with holes in pageblocks.
>> It seems that "zone->contigous" is really bad name for what this thing
>> represents.
>>
>> tl;dr I don't think zone_spanned_pages == zone_present_pages is
>> related to
>> zone->contigous at all :)
>
> My point in (a) was that with "zone_spanned_pages == zone_present_pages"
> there are no holes so -> contiguous.
>
> (b), and what I said further below, is exactly about memory holes where
> we have a memmap, but it's not present memory.
>
>>
>> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
>> check for zone->contigous should guarantee that the entire pageblock
>> has a
>> valid memory map and that the entire pageblock fits a zone and does not
>> cross zone/node boundaries.
>
> Right. But that must hold for each and ever pageblock in the spanned
> zone range for it to be contiguous.
>
> zone->contigous tells you "pfn_to_page()" is valid on the complete zone
> range"
>
> That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech
> and ever pageblock.
>
>>
>> For coldplug memory the memory map is valid for every section that has
>> present memory, i.e. even it there is a hole in a section, it's memory
>> map
>> will be populated and will have struct pages.
>
> There is this sub-section thing, and holes larger than a section might
> not have a memmap (unless reserved I guess).
>
>>
>> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
>> essentially checks if the first page in a pageblock is online and if
>> first
>> and last pages are in the zone being compacted.
>> AFAIU, in the hotplug case an entire pageblock is always onlined to the
>> same zone, so zone->contigous won't change after the hotplug is complete.
>
> I think you are missing a point: hotp(un)plug might create holes in the
> zone span. Then, pfn_to_page() is no longer valid to be called on
> arbitrary pageblocks within the zone.
>
>>
>> We might set it to false in the beginning of the hotplug to avoid
>> scanning
>> offline pages, although I'm not sure if it's possible.
>>
>> But in the end of hotplug we can simply restore the old value and move
>> on.
>
> No, you might create holes.
>
>>
>> For the coldplug case I'm also not sure it's worth the hassle, we could
>> just let compaction scan a few more pfns for those rare weird pageblocks
>> and bail out on wrong page conditions.
>
> To recap:
>
> My idea is that "zone_spanned_pages == zone_present_pages" tells you
> that the zone is contiguous because there are no holes.
>
> To handle "non-memory with a struct page", you'd have to check
>
> "zone_spanned_pages == zone_present_pages +
> zone_non_present_memmap_pages"
>
> Or shorter
>
> "zone_spanned_pages == zone_pages_with_memmap"
>
> Then, pfn_to_page() is valid within the complete zone.
>
> The question is how to best calculate the "zone_pages_with_memmap"
> during boot.
>
> During hot(un)plug we only add/remove zone_present_pages. The
> zone_non_present_memmap_pages will not change due to hot(un)plug later.
>
The following hack does the trick. But
(a) I wish we could get rid of the pageblock walking in calc_online_pages().
(b) "online_pages" has weird semantics due to the pageblock handling.
"online_pageblock_pages"? not sure.
(c) Calculating "online_pages" when we know there is a hole does not make sense,
as we could just keep it 0 if there are holes and simply set it to
zone->online_pageblock_pages->zone->spanned_pages in case all are online.
From d4cb825e91a6363afc68fb994c5d9b29c38c5f42 Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@...nel.org>
Date: Mon, 9 Feb 2026 13:40:24 +0100
Subject: [PATCH] tmp
Signed-off-by: David Hildenbrand (Arm) <david@...nel.org>
---
include/linux/mmzone.h | 25 +++++++++++++++++++++++--
mm/internal.h | 8 +-------
mm/memory_hotplug.c | 20 ++++++--------------
mm/mm_init.c | 12 ++++++------
4 files changed, 36 insertions(+), 29 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc5d6c88d2f0..3f7d8d88c597 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,11 @@ struct zone {
* cma pages is present pages that are assigned for CMA use
* (MIGRATE_CMA).
*
+ * online_pages is pages within the zone that have an online memmap.
+ * online_pages include present pages and memory holes that have a
+ * memmap. When spanned_pages == online_pages, pfn_to_page() can be
+ * performed without further checks on any pfn within the zone span.
+ *
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +972,7 @@ struct zone {
atomic_long_t managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;
+ unsigned long online_pages;
#if defined(CONFIG_MEMORY_HOTPLUG)
unsigned long present_early_pages;
#endif
@@ -1051,8 +1057,6 @@ struct zone {
bool compact_blockskip_flush;
#endif
- bool contiguous;
-
CACHELINE_PADDING(_pad3_);
/* Zone statistics */
atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
@@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
}
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+ return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
+}
+
static inline bool zone_is_initialized(const struct zone *zone)
{
return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index f35dbcf99a86..6062f9b8ee62 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
unsigned long end_pfn, struct zone *zone)
{
- if (zone->contiguous)
+ if (zone_is_contiguous(zone))
return pfn_to_page(start_pfn);
return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
}
-void set_zone_contiguous(struct zone *zone);
bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
unsigned long nr_pages);
-static inline void clear_zone_contiguous(struct zone *zone)
-{
- zone->contiguous = false;
-}
-
extern int __isolate_free_page(struct page *page, unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a63ec679d861..76496c1039a9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_smallest_section_pfn(nid, zone, end_pfn,
zone_end_pfn(zone));
if (pfn) {
- zone->spanned_pages = zone_end_pfn(zone) - pfn;
+ WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
zone->zone_start_pfn = pfn;
} else {
zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ WRITE_ONCE(zone->spanned_pages, 0);
}
} else if (zone_end_pfn(zone) == end_pfn) {
/*
@@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
start_pfn);
if (pfn)
- zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+ WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
else {
zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ WRITE_ONCE(zone->spanned_pages, 0);
}
}
}
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
/*
* Zone shrinking code cannot properly deal with ZONE_DEVICE. So
- * we will not try to shrink the zones - which is okay as
- * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+ * we will not try to shrink the zones.
*/
if (zone_is_zone_device(zone))
return;
- clear_zone_contiguous(zone);
-
shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
update_pgdat_span(pgdat);
-
- set_zone_contiguous(zone);
}
/**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
struct pglist_data *pgdat = zone->zone_pgdat;
int nid = pgdat->node_id;
- clear_zone_contiguous(zone);
-
if (zone_is_empty(zone))
init_currently_empty_zone(zone, start_pfn, nr_pages);
resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype,
isolate_pageblock);
-
- set_zone_contiguous(zone);
}
struct auto_movable_stats {
@@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
if (early_section(__pfn_to_section(page_to_pfn(page))))
zone->present_early_pages += nr_pages;
zone->present_pages += nr_pages;
+ WRITE_ONCE(zone->online_pages, zone->online_pages + nr_pages);
zone->zone_pgdat->node_present_pages += nr_pages;
if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2a809cd8e7fa..e33caa6fb6fc 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
}
#endif
-void set_zone_contiguous(struct zone *zone)
+static void calc_online_pages(struct zone *zone)
{
unsigned long block_start_pfn = zone->zone_start_pfn;
+ unsigned long online_pages = 0;
unsigned long block_end_pfn;
block_end_pfn = pageblock_end_pfn(block_start_pfn);
@@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
if (!__pageblock_pfn_to_page(block_start_pfn,
block_end_pfn, zone))
- return;
+ continue;
cond_resched();
+ online_pages += block_end_pfn - block_start_pfn;
}
-
- /* We confirm that there is no hole */
- zone->contiguous = true;
+ zone->online_pages = online_pages;
}
/*
@@ -2348,7 +2348,7 @@ void __init page_alloc_init_late(void)
shuffle_free_memory(NODE_DATA(nid));
for_each_populated_zone(zone)
- set_zone_contiguous(zone);
+ calc_online_pages(zone);
/* Initialize page ext after all struct pages are initialized. */
if (deferred_struct_pages)
--
2.43.0
--
Cheers,
David
Powered by blists - more mailing lists