lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ea2dbce-c919-49d6-b2cb-255a565a94e0@kernel.org>
Date: Mon, 9 Feb 2026 11:52:28 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: Mike Rapoport <rppt@...nel.org>
Cc: Tianyou Li <tianyou.li@...el.com>, Oscar Salvador <osalvador@...e.de>,
 Wei Yang <richard.weiyang@...il.com>, Michal Hocko <mhocko@...e.com>,
 linux-mm@...ck.org, Yong Hu <yong.hu@...el.com>,
 Nanhai Zou <nanhai.zou@...el.com>, Yuan Liu <yuan1.liu@...el.com>,
 Tim Chen <tim.c.chen@...ux.intel.com>, Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
 Yu C Chen <yu.c.chen@...el.com>, Pan Deng <pan.deng@...el.com>,
 Chen Zhang <zhangchen.kidd@...com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize
 zone->contiguous update when changes pfn range

On 2/8/26 20:39, Mike Rapoport wrote:
> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>> On 1/30/26 17:37, Tianyou Li wrote:
>>> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
>>> update the zone->contiguous by checking the new zone's pfn range from the
>>> beginning to the end, regardless the previous state of the old zone. When
>>> the zone's pfn range is large, the cost of traversing the pfn range to
>>> update the zone->contiguous could be significant.
>>>
>>> Add fast paths to quickly detect cases where zone is definitely not
>>> contiguous without scanning the new zone. The cases are: when the new range
>>> did not overlap with previous range, the contiguous should be false; if the
>>> new range adjacent with the previous range, just need to check the new
>>> range; if the new added pages could not fill the hole of previous zone, the
>>> contiguous should be false.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Plug Memory    | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      33s      |      6s      |       81%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      34s      |      6s      |       82%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>>       object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>>       device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>>       qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>>       qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware     : Intel Icelake server
>>>       Guest Kernel : v6.18-rc2
>>>       Qemu         : v9.0.0
>>>
>>>       Launch VM    :
>>>       qemu-system-x86_64 -accel kvm -cpu host \
>>>       -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>>       -drive file=./seed.img,format=raw,if=virtio \
>>>       -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>>       -m 2G,slots=10,maxmem=2052472M \
>>>       -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>>       -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>>       -nographic -machine q35 \
>>>       -nic user,hostfwd=tcp::3000-:22
>>>
>>>       Guest kernel auto-onlines newly added memory blocks:
>>>       echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>>       'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>>       memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@...el.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@...com>
>>> Tested-by: Yuan Liu <yuan1.liu@...el.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@...el.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@...el.com>
>>> Reviewed-by: Pan Deng <pan.deng@...el.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@...el.com>
>>> Reviewed-by: Yuan Liu <yuan1.liu@...el.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@...el.com>
>>> ---
>>
>> Thanks for all your work on this and sorry for being slower with
>> review the last month.
>>
>> While I was in the shower I was thinking about how much I hate
>> zone->contiguous + the pageblock walking, and how we could just get
>> rid of it.
>>
>> You know, just what you do while having a relaxing shower.
>>
>>
>> And I was wondering:
>>
>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>> and the zone *not* being contiguous? I assume this just cannot happen,
>> otherwise BUG.
>>
>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>> and the zone *being* contiguous? I assume in some cases where we have small
>> holes within a pageblock?
>>
>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>> scenarios with holes in pageblocks.
>   
> It seems that "zone->contigous" is really bad name for what this thing
> represents.
> 
> tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
> zone->contigous at all :)

My point in (a) was that with "zone_spanned_pages == zone_present_pages" 
there are no holes so -> contiguous.

(b), and what I said further below, is exactly about memory holes where 
we have a memmap, but it's not present memory.

> 
> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
> check for zone->contigous should guarantee that the entire pageblock has a
> valid memory map and that the entire pageblock fits a zone and does not
> cross zone/node boundaries.

Right. But that must hold for each and ever pageblock in the spanned 
zone range for it to be contiguous.

zone->contigous tells you "pfn_to_page()" is valid on the complete zone 
range"

That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech 
and ever pageblock.

> 
> For coldplug memory the memory map is valid for every section that has
> present memory, i.e. even it there is a hole in a section, it's memory map
> will be populated and will have struct pages.

There is this sub-section thing, and holes larger than a section might 
not have a memmap (unless reserved I guess).

> 
> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
> essentially checks if the first page in a pageblock is online and if first
> and last pages are in the zone being compacted.
>   
> AFAIU, in the hotplug case an entire pageblock is always onlined to the
> same zone, so zone->contigous won't change after the hotplug is complete.

I think you are missing a point: hotp(un)plug might create holes in the 
zone span. Then, pfn_to_page() is no longer valid to be called on 
arbitrary pageblocks within the zone.

> 
> We might set it to false in the beginning of the hotplug to avoid scanning
> offline pages, although I'm not sure if it's possible.
> 
> But in the end of hotplug we can simply restore the old value and move on.

No, you might create holes.

> 
> For the coldplug case I'm also not sure it's worth the hassle, we could
> just let compaction scan a few more pfns for those rare weird pageblocks
> and bail out on wrong page conditions.

To recap:

My idea is that "zone_spanned_pages == zone_present_pages" tells you 
that the zone is contiguous because there are no holes.

To handle "non-memory with a struct page", you'd have to check

	"zone_spanned_pages == zone_present_pages +
          zone_non_present_memmap_pages"

Or shorter

	"zone_spanned_pages == zone_pages_with_memmap"

Then, pfn_to_page() is valid within the complete zone.

The question is how to best calculate the "zone_pages_with_memmap" 
during boot.

During hot(un)plug we only add/remove zone_present_pages. The 
zone_non_present_memmap_pages will not change due to hot(un)plug later.

-- 
Cheers,

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ