lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYjmcZ4hg9bNbmiY@kernel.org>
Date: Sun, 8 Feb 2026 21:39:29 +0200
From: Mike Rapoport <rppt@...nel.org>
To: "David Hildenbrand (Arm)" <david@...nel.org>
Cc: Tianyou Li <tianyou.li@...el.com>, Oscar Salvador <osalvador@...e.de>,
	Wei Yang <richard.weiyang@...il.com>,
	Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org,
	Yong Hu <yong.hu@...el.com>, Nanhai Zou <nanhai.zou@...el.com>,
	Yuan Liu <yuan1.liu@...el.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@...el.com>, Yu C Chen <yu.c.chen@...el.com>,
	Pan Deng <pan.deng@...el.com>, Chen Zhang <zhangchen.kidd@...com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize
 zone->contiguous update when changes pfn range

On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
> On 1/30/26 17:37, Tianyou Li wrote:
> > When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> > update the zone->contiguous by checking the new zone's pfn range from the
> > beginning to the end, regardless the previous state of the old zone. When
> > the zone's pfn range is large, the cost of traversing the pfn range to
> > update the zone->contiguous could be significant.
> > 
> > Add fast paths to quickly detect cases where zone is definitely not
> > contiguous without scanning the new zone. The cases are: when the new range
> > did not overlap with previous range, the contiguous should be false; if the
> > new range adjacent with the previous range, just need to check the new
> > range; if the new added pages could not fill the hole of previous zone, the
> > contiguous should be false.
> > 
> > The following test cases of memory hotplug for a VM [1], tested in the
> > environment [2], show that this optimization can significantly reduce the
> > memory hotplug time [3].
> > 
> > +----------------+------+---------------+--------------+----------------+
> > |                | Size | Time (before) | Time (after) | Time Reduction |
> > |                +------+---------------+--------------+----------------+
> > | Plug Memory    | 256G |      10s      |      2s      |       80%      |
> > |                +------+---------------+--------------+----------------+
> > |                | 512G |      33s      |      6s      |       81%      |
> > +----------------+------+---------------+--------------+----------------+
> > 
> > +----------------+------+---------------+--------------+----------------+
> > |                | Size | Time (before) | Time (after) | Time Reduction |
> > |                +------+---------------+--------------+----------------+
> > | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
> > |                +------+---------------+--------------+----------------+
> > |                | 512G |      34s      |      6s      |       82%      |
> > +----------------+------+---------------+--------------+----------------+
> > 
> > [1] Qemu commands to hotplug 256G/512G memory for a VM:
> >      object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> >      device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> >      qom-set vmem1 requested-size 256G/512G (Plug Memory)
> >      qom-set vmem1 requested-size 0G (Unplug Memory)
> > 
> > [2] Hardware     : Intel Icelake server
> >      Guest Kernel : v6.18-rc2
> >      Qemu         : v9.0.0
> > 
> >      Launch VM    :
> >      qemu-system-x86_64 -accel kvm -cpu host \
> >      -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> >      -drive file=./seed.img,format=raw,if=virtio \
> >      -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> >      -m 2G,slots=10,maxmem=2052472M \
> >      -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> >      -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> >      -nographic -machine q35 \
> >      -nic user,hostfwd=tcp::3000-:22
> > 
> >      Guest kernel auto-onlines newly added memory blocks:
> >      echo online > /sys/devices/system/memory/auto_online_blocks
> > 
> > [3] The time from typing the QEMU commands in [1] to when the output of
> >      'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> >      memory is recognized.
> > 
> > Reported-by: Nanhai Zou <nanhai.zou@...el.com>
> > Reported-by: Chen Zhang <zhangchen.kidd@...com>
> > Tested-by: Yuan Liu <yuan1.liu@...el.com>
> > Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@...el.com>
> > Reviewed-by: Yu C Chen <yu.c.chen@...el.com>
> > Reviewed-by: Pan Deng <pan.deng@...el.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@...el.com>
> > Reviewed-by: Yuan Liu <yuan1.liu@...el.com>
> > Signed-off-by: Tianyou Li <tianyou.li@...el.com>
> > ---
> 
> Thanks for all your work on this and sorry for being slower with
> review the last month.
> 
> While I was in the shower I was thinking about how much I hate
> zone->contiguous + the pageblock walking, and how we could just get
> rid of it.
> 
> You know, just what you do while having a relaxing shower.
> 
> 
> And I was wondering:
> 
> (a) in which case would we have zone_spanned_pages == zone_present_pages
> and the zone *not* being contiguous? I assume this just cannot happen,
> otherwise BUG.
> 
> (b) in which case would we have zone_spanned_pages != zone_present_pages
> and the zone *being* contiguous? I assume in some cases where we have small
> holes within a pageblock?
>
> Reading the doc of __pageblock_pfn_to_page(), there are some weird
> scenarios with holes in pageblocks.
 
It seems that "zone->contigous" is really bad name for what this thing
represents.

tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
zone->contigous at all :)

If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
check for zone->contigous should guarantee that the entire pageblock has a
valid memory map and that the entire pageblock fits a zone and does not
cross zone/node boundaries.

For coldplug memory the memory map is valid for every section that has
present memory, i.e. even it there is a hole in a section, it's memory map
will be populated and will have struct pages.

When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
essentially checks if the first page in a pageblock is online and if first
and last pages are in the zone being compacted. 
 
AFAIU, in the hotplug case an entire pageblock is always onlined to the
same zone, so zone->contigous won't change after the hotplug is complete.

We might set it to false in the beginning of the hotplug to avoid scanning
offline pages, although I'm not sure if it's possible.

But in the end of hotplug we can simply restore the old value and move on.

For the coldplug case I'm also not sure it's worth the hassle, we could
just let compaction scan a few more pfns for those rare weird pageblocks
and bail out on wrong page conditions.

> I.e., on my notebook I have
> 
> $ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
> Node 0, zone      DMA
>         spanned  4095
>         present  3999
> Node 0, zone    DMA32
>         spanned  1044480
>         present  439600

I suspect this one is contigous ;-)

> Node 0, zone   Normal
>         spanned  7798784
>         present  7798784
> Node 0, zone  Movable
>         spanned  0
>         present  0
> Node 0, zone   Device
>         spanned  0
>         present  0
> 
> 
> For the most important zone regarding compaction, ZONE_NORMAL, it would be good enough.
> 
> We certainly don't care about detecting contigous for the DMA zone. For DMA32, I would suspect
> that it is not detected as contigous either way, because the holes are just way too large?
> 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ