linux-kernel - Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <add15859-31e2-1688-3d8c-26e2579e9a57@intel.com>
Date:   Fri, 21 May 2021 15:13:35 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        Linux-MM <linux-mm@...ck.org>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug
 events

On 5/21/21 3:28 AM, Mel Gorman wrote:
> The PCP high watermark is based on the number of online CPUs so the
> watermarks must be adjusted during CPU hotplug. At the time of
> hot-remove, the number of online CPUs is already adjusted but during
> hot-add, a delta needs to be applied to update PCP to the correct
> value. After this patch is applied, the high watermarks are adjusted
> correctly.
> 
>   # grep high: /proc/zoneinfo  | tail -1
>               high:  649
>   # echo 0 > /sys/devices/system/cpu/cpu4/online
>   # grep high: /proc/zoneinfo  | tail -1
>               high:  664
>   # echo 1 > /sys/devices/system/cpu/cpu4/online
>   # grep high: /proc/zoneinfo  | tail -1
>               high:  649

This is actually a comment more about the previous patch, but it doesn't
really become apparent until the example above.

In your example, you mentioned increased exit() performance by using
"vm.percpu_pagelist_fraction to increase the pcp->high value".  That's
presumably because of the increased batching effects and fewer lock
acquisitions.

But, logically, doesn't that mean that, the more CPUs you have in a
node, the *higher* you want pcp->high to be?  If we took this to the
extreme and had an absurd number of CPUs in a node, we could end up with
a too-small pcp->high value.

Also, do you worry at all about a zone with a low min_free_kbytes seeing
increased zone lock contention?

...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf5cdc466e6c..2761b03b3a44 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6628,7 +6628,7 @@ static int zone_batchsize(struct zone *zone)
>  #endif
>  }
>  
> -static int zone_highsize(struct zone *zone)
> +static int zone_highsize(struct zone *zone, int cpu_online)
>  {
>  #ifdef CONFIG_MMU
>  	int high;
> @@ -6640,7 +6640,7 @@ static int zone_highsize(struct zone *zone)
>  	 * CPUs local to a zone. Note that early in boot that CPUs may
>  	 * not be online yet.
>  	 */
> -	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone))));
> +	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online;
>  	high = low_wmark_pages(zone) / nr_local_cpus;

Is this "+ cpu_online" bias because the CPU isn't in cpumask_of_node()
when the CPU hotplug callback occurs?  If so, it might be nice to mention.