lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d9da08d-4293-4dbd-bf59-999488d73763@intel.com>
Date: Tue, 2 Dec 2025 07:40:11 +0800
From: "Li, Tianyou" <tianyou.li@...el.com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>, Oscar Salvador
	<osalvador@...e.de>, Mike Rapoport <rppt@...nel.org>, Wei Yang
	<richard.weiyang@...il.com>
CC: <linux-mm@...ck.org>, Yong Hu <yong.hu@...el.com>, Nanhai Zou
	<nanhai.zou@...el.com>, Yuan Liu <yuan1.liu@...el.com>, Tim Chen
	<tim.c.chen@...ux.intel.com>, Qiuxu Zhuo <qiuxu.zhuo@...el.com>, Yu C Chen
	<yu.c.chen@...el.com>, Pan Deng <pan.deng@...el.com>, Chen Zhang
	<zhangchen.kidd@...com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] mm/memory hotplug/unplug: Optimize zone->contiguous
 update when changes pfn range

Thanks David for the detailed review comments. I will change the code 
accordingly after one or two days, in case any other review changes 
needed to accommodate. Very appreciated.


On 12/2/2025 2:54 AM, David Hildenbrand (Red Hat) wrote:
> On 12/1/25 14:22, Tianyou Li wrote:
>> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it 
>> will
>> update the zone->contiguous by checking the new zone's pfn range from 
>> the
>> beginning to the end, regardless the previous state of the old zone. 
>> When
>> the zone's pfn range is large, the cost of traversing the pfn range to
>> update the zone->contiguous could be significant.
>>
>> Add fast paths to quickly detect cases where zone is definitely not
>> contiguous without scanning the new zone. The cases are: when the new 
>> range
>> did not overlap with previous range, the contiguous should be false; 
>> if the
>> new range adjacent with the previous range, just need to check the new
>> range; if the new added pages could not fill the hole of previous 
>> zone, the
>> contiguous should be false.
>>
>> The following test cases of memory hotplug for a VM [1], tested in the
>> environment [2], show that this optimization can significantly reduce 
>> the
>> memory hotplug time [3].
>>
>> +----------------+------+---------------+--------------+----------------+ 
>>
>> |                | Size | Time (before) | Time (after) | Time 
>> Reduction |
>> | +------+---------------+--------------+----------------+
>> | Plug Memory    | 256G |      10s      |      2s      | 80%      |
>> | +------+---------------+--------------+----------------+
>> |                | 512G |      33s      |      6s      | 81%      |
>> +----------------+------+---------------+--------------+----------------+ 
>>
>>
>> +----------------+------+---------------+--------------+----------------+ 
>>
>> |                | Size | Time (before) | Time (after) | Time 
>> Reduction |
>> | +------+---------------+--------------+----------------+
>> | Unplug Memory  | 256G |      10s      |      2s      | 80%      |
>> | +------+---------------+--------------+----------------+
>> |                | 512G |      34s      |      6s      | 82%      |
>> +----------------+------+---------------+--------------+----------------+ 
>>
>>
>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>      object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>      device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>      qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>      qom-set vmem1 requested-size 0G (Unplug Memory)
>>
>> [2] Hardware     : Intel Icelake server
>>      Guest Kernel : v6.18-rc2
>>      Qemu         : v9.0.0
>>
>>      Launch VM    :
>>      qemu-system-x86_64 -accel kvm -cpu host \
>>      -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>      -drive file=./seed.img,format=raw,if=virtio \
>>      -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>      -m 2G,slots=10,maxmem=2052472M \
>>      -device 
>> pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>      -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>      -nographic -machine q35 \
>>      -nic user,hostfwd=tcp::3000-:22
>>
>>      Guest kernel auto-onlines newly added memory blocks:
>>      echo online > /sys/devices/system/memory/auto_online_blocks
>>
>> [3] The time from typing the QEMU commands in [1] to when the output of
>>      'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>      memory is recognized.
>>
>> Reported-by: Nanhai Zou <nanhai.zou@...el.com>
>> Reported-by: Chen Zhang <zhangchen.kidd@...com>
>> Tested-by: Yuan Liu <yuan1.liu@...el.com>
>> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@...el.com>
>> Reviewed-by: Yu C Chen <yu.c.chen@...el.com>
>> Reviewed-by: Pan Deng <pan.deng@...el.com>
>> Reviewed-by: Nanhai Zou <nanhai.zou@...el.com>
>> Reviewed-by: Yuan Liu <yuan1.liu@...el.com>
>> Signed-off-by: Tianyou Li <tianyou.li@...el.com>
>> ---
>>   mm/internal.h       |  8 ++++-
>>   mm/memory_hotplug.c | 79 ++++++++++++++++++++++++++++++++++++++++++---
>>   mm/mm_init.c        | 36 +++++++++++++--------
>>   3 files changed, 103 insertions(+), 20 deletions(-)
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 1561fc2ff5b8..a94928520a55 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -730,7 +730,13 @@ static inline struct page 
>> *pageblock_pfn_to_page(unsigned long start_pfn,
>>       return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>>   }
>>   -void set_zone_contiguous(struct zone *zone);
>> +enum zone_contiguous_state {
>> +    CONTIGUOUS_DEFINITELY_NOT = 0,
>> +    CONTIGUOUS_DEFINITELY = 1,
>> +    CONTIGUOUS_UNDETERMINED = 2,
>
> No need for the values.
>

Got it.


>> +};
>
> I don't like that the defines don't match the enum name (zone_c... vs. 
> CONT... ).
>
> Essentially you want a "yes / no / maybe" tristate. I don't think we 
> have an existing type for that, unfortunately.
>
> enum zone_contig_state {
>     ZONE_CONTIG_YES,
>     ZONE_CONTIG_NO,
>     ZONE_CONTIG_MAYBE,
> };
>
> Maybe someone reading along has a better idea.
>

I agree it's better. Will wait for a day or two to make the change.


>> +
>> +void set_zone_contiguous(struct zone *zone, enum 
>> zone_contiguous_state state);
>>   bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>>                  unsigned long nr_pages);
>>   diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 0be83039c3b5..b74e558ce822 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -544,6 +544,32 @@ static void update_pgdat_span(struct pglist_data 
>> *pgdat)
>>       pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
>>   }
>>   +static enum zone_contiguous_state __meminit 
>> clear_zone_contiguous_for_shrinking(
>> +        struct zone *zone, unsigned long start_pfn, unsigned long 
>> nr_pages)
>> +{
>> +    const unsigned long end_pfn = start_pfn + nr_pages;
>> +    enum zone_contiguous_state result = CONTIGUOUS_UNDETERMINED;
>> +
>> +    /*
>> +     * If the removed pfn range inside the original zone span, the 
>> contiguous
>> +     * property is surely false.
>> +     */
>> +    if (start_pfn > zone->zone_start_pfn && end_pfn < 
>> zone_end_pfn(zone))
>> +        result = CONTIGUOUS_DEFINITELY_NOT;
>> +
>> +    /*
>> +     * If the removed pfn range is at the beginning or end of the
>> +     * original zone span, the contiguous property is preserved when
>> +     * the original zone is contiguous.
>> +     */
>> +    else if (start_pfn == zone->zone_start_pfn || end_pfn == 
>> zone_end_pfn(zone))
>> +        result = zone->contiguous ?
>> +            CONTIGUOUS_DEFINITELY : CONTIGUOUS_UNDETERMINED;
>> +
>
> See my comment below on how to make this readable.
>
>> +    clear_zone_contiguous(zone);
>> +    return result;
>> +}
>> +
>>   void remove_pfn_range_from_zone(struct zone *zone,
>>                         unsigned long start_pfn,
>>                         unsigned long nr_pages)
>> @@ -551,6 +577,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
>>       const unsigned long end_pfn = start_pfn + nr_pages;
>>       struct pglist_data *pgdat = zone->zone_pgdat;
>>       unsigned long pfn, cur_nr_pages;
>> +    enum zone_contiguous_state contiguous_state = 
>> CONTIGUOUS_UNDETERMINED;
>>         /* Poison struct pages because they are now uninitialized 
>> again. */
>>       for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
>> @@ -571,12 +598,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>>       if (zone_is_zone_device(zone))
>>           return;
>>   -    clear_zone_contiguous(zone);
>> +    contiguous_state = clear_zone_contiguous_for_shrinking(
>> +                zone, start_pfn, nr_pages);
>
> Reading this again, I wonder whether it would be nicer to have 
> something like:
>
> new_contig_state = zone_contig_state_after_shrinking();
> clear_zone_contiguous(zone);
>
> or sth like that. Similar for the growing case.
>

In both shrinking and growing case, separate the clear_zone_contiguous 
from the logic of zone state check, right?


>>       shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>>       update_pgdat_span(pgdat);
>>   -    set_zone_contiguous(zone);
>> +    set_zone_contiguous(zone, contiguous_state);
>>   }
>>     /**
>> @@ -736,6 +764,47 @@ static inline void 
>> section_taint_zone_device(unsigned long pfn)
>>   }
>>   #endif
>>   +static enum zone_contiguous_state __meminit 
>> clear_zone_contiguous_for_growing(
>> +        struct zone *zone, unsigned long start_pfn, unsigned long 
>> nr_pages)
>> +{
>> +    const unsigned long end_pfn = start_pfn + nr_pages;
>> +    enum zone_contiguous_state result = CONTIGUOUS_UNDETERMINED;
>> +
>> +    /*
>> +     * Given the moved pfn range's contiguous property is always true,
>> +     * under the conditional of empty zone, the contiguous property 
>> should
>> +     * be true.
>> +     */
>
> I don't think that comment is required.
>

Got it.


>> +    if (zone_is_empty(zone))
>> +        result = CONTIGUOUS_DEFINITELY;
>> +
>> +    /*
>> +     * If the moved pfn range does not intersect with the original 
>> zone span,
>> +     * the contiguous property is surely false.
>> +     */
>> +    else if (end_pfn < zone->zone_start_pfn || start_pfn > 
>> zone_end_pfn(zone))
>> +        result = CONTIGUOUS_DEFINITELY_NOT;
>> +
>> +    /*
>> +     * If the moved pfn range is adjacent to the original zone span, 
>> given
>> +     * the moved pfn range's contiguous property is always true, the 
>> zone's
>> +     * contiguous property inherited from the original value.
>> +     */
>> +    else if (end_pfn == zone->zone_start_pfn || start_pfn == 
>> zone_end_pfn(zone))
>> +        result = zone->contiguous ?
>> +            CONTIGUOUS_DEFINITELY : CONTIGUOUS_DEFINITELY_NOT;
>> +
>> +    /*
>> +     * If the original zone's hole larger than the moved pages in 
>> the range,
>> +     * the contiguous property is surely false.
>> +     */
>> +    else if (nr_pages < (zone->spanned_pages - zone->present_pages))
>> +        result = CONTIGUOUS_DEFINITELY_NOT;
>> +
>
> This is a bit unreadable :)
>
> if (zone_is_empty(zone)) {
>     result = CONTIGUOUS_DEFINITELY;
> } else if (...) {
>     /* ... */
>     ...
> } else if (...) {
>     ...
> }
>

Yes, I am thinking of that during coding. I was keeping that way for a 
'better looking'. I agree that will not be a good case for maintaining 
the code. Thanks.


>> +    clear_zone_contiguous(zone);
>> +    return result;
>> +}
>> +
>>   /*
>>    * Associate the pfn range with the given zone, initializing the 
>> memmaps
>>    * and resizing the pgdat/zone data to span the added pages. After 
>> this
>> @@ -752,8 +821,8 @@ void move_pfn_range_to_zone(struct zone *zone, 
>> unsigned long start_pfn,
>>   {
>>       struct pglist_data *pgdat = zone->zone_pgdat;
>>       int nid = pgdat->node_id;
>> -
>> -    clear_zone_contiguous(zone);
>> +    const enum zone_contiguous_state contiguous_state =
>> +        clear_zone_contiguous_for_growing(zone, start_pfn, nr_pages);
>>         if (zone_is_empty(zone))
>>           init_currently_empty_zone(zone, start_pfn, nr_pages);
>> @@ -783,7 +852,7 @@ void move_pfn_range_to_zone(struct zone *zone, 
>> unsigned long start_pfn,
>>                MEMINIT_HOTPLUG, altmap, migratetype,
>>                isolate_pageblock);
>>   -    set_zone_contiguous(zone);
>> +    set_zone_contiguous(zone, contiguous_state);
>>   }
>>     struct auto_movable_stats {
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 7712d887b696..06db3fcf7f95 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -2263,26 +2263,34 @@ void __init init_cma_pageblock(struct page 
>> *page)
>>   }
>>   #endif
>>   -void set_zone_contiguous(struct zone *zone)
>> +void set_zone_contiguous(struct zone *zone, enum 
>> zone_contiguous_state state)
>>   {
>>       unsigned long block_start_pfn = zone->zone_start_pfn;
>>       unsigned long block_end_pfn;
>>   -    block_end_pfn = pageblock_end_pfn(block_start_pfn);
>> -    for (; block_start_pfn < zone_end_pfn(zone);
>> -            block_start_pfn = block_end_pfn,
>> -             block_end_pfn += pageblock_nr_pages) {
>> +    if (state == CONTIGUOUS_DEFINITELY) {
>> +        zone->contiguous = true;
>> +        return;
>> +    } else if (state == CONTIGUOUS_DEFINITELY_NOT) {
>> +        // zone contiguous has already cleared as false, just return.
>> +        return;
>> +    } else if (state == CONTIGUOUS_UNDETERMINED) {
>> +        block_end_pfn = pageblock_end_pfn(block_start_pfn);
>> +        for (; block_start_pfn < zone_end_pfn(zone);
>> +                block_start_pfn = block_end_pfn,
>> +                block_end_pfn += pageblock_nr_pages) {
>>   -        block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>> +            block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>>   -        if (!__pageblock_pfn_to_page(block_start_pfn,
>> -                         block_end_pfn, zone))
>> -            return;
>> -        cond_resched();
>> -    }
>> +            if (!__pageblock_pfn_to_page(block_start_pfn,
>> +                        block_end_pfn, zone))
>> +                return;
>> +            cond_resched();
>> +        }
>>   -    /* We confirm that there is no hole */
>> -    zone->contiguous = true;
>> +        /* We confirm that there is no hole */
>> +        zone->contiguous = true;
>> +    }
>>   }
>
>
> switch (state) {
> case CONTIGUOUS_DEFINITELY:
>     zone->contiguous = true;
>     return;
> case CONTIGUOUS_DEFINITELY_NOT:
>     return;
> default:
>     break;
> }
>
> ... unchanged logic.
>

Will do. Thanks.


>>   /*
>> @@ -2348,7 +2356,7 @@ void __init page_alloc_init_late(void)
>>           shuffle_free_memory(NODE_DATA(nid));
>>         for_each_populated_zone(zone)
>> -        set_zone_contiguous(zone);
>> +        set_zone_contiguous(zone, CONTIGUOUS_UNDETERMINED);
>>         /* Initialize page ext after all struct pages are 
>> initialized. */
>>       if (deferred_struct_pages)
>
>

Regards,

Tianyou


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ