[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <add15859-31e2-1688-3d8c-26e2579e9a57@intel.com>
Date: Fri, 21 May 2021 15:13:35 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Mel Gorman <mgorman@...hsingularity.net>,
Linux-MM <linux-mm@...ck.org>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>,
Michal Hocko <mhocko@...nel.org>,
Nicholas Piggin <npiggin@...il.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug
events
On 5/21/21 3:28 AM, Mel Gorman wrote:
> The PCP high watermark is based on the number of online CPUs so the
> watermarks must be adjusted during CPU hotplug. At the time of
> hot-remove, the number of online CPUs is already adjusted but during
> hot-add, a delta needs to be applied to update PCP to the correct
> value. After this patch is applied, the high watermarks are adjusted
> correctly.
>
> # grep high: /proc/zoneinfo | tail -1
> high: 649
> # echo 0 > /sys/devices/system/cpu/cpu4/online
> # grep high: /proc/zoneinfo | tail -1
> high: 664
> # echo 1 > /sys/devices/system/cpu/cpu4/online
> # grep high: /proc/zoneinfo | tail -1
> high: 649
This is actually a comment more about the previous patch, but it doesn't
really become apparent until the example above.
In your example, you mentioned increased exit() performance by using
"vm.percpu_pagelist_fraction to increase the pcp->high value". That's
presumably because of the increased batching effects and fewer lock
acquisitions.
But, logically, doesn't that mean that, the more CPUs you have in a
node, the *higher* you want pcp->high to be? If we took this to the
extreme and had an absurd number of CPUs in a node, we could end up with
a too-small pcp->high value.
Also, do you worry at all about a zone with a low min_free_kbytes seeing
increased zone lock contention?
...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf5cdc466e6c..2761b03b3a44 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6628,7 +6628,7 @@ static int zone_batchsize(struct zone *zone)
> #endif
> }
>
> -static int zone_highsize(struct zone *zone)
> +static int zone_highsize(struct zone *zone, int cpu_online)
> {
> #ifdef CONFIG_MMU
> int high;
> @@ -6640,7 +6640,7 @@ static int zone_highsize(struct zone *zone)
> * CPUs local to a zone. Note that early in boot that CPUs may
> * not be online yet.
> */
> - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone))));
> + nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online;
> high = low_wmark_pages(zone) / nr_local_cpus;
Is this "+ cpu_online" bias because the CPU isn't in cpumask_of_node()
when the CPU hotplug callback occurs? If so, it might be nice to mention.
Powered by blists - more mailing lists