[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1dc2aa1-cd38-4f1f-89e9-6d009a619541@arm.com>
Date: Fri, 20 Sep 2024 12:28:44 +0530
From: Anshuman Khandual <anshuman.khandual@....com>
To: Saurabh Sengar <ssengar@...ux.microsoft.com>, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc: ssengar@...rosoft.com, wei.liu@...nel.org, srivatsa@...il.mit.edu
Subject: Re: [PATCH v2] mm/vmstat: Defer the refresh_zone_stat_thresholds
after all CPUs bringup
On 8/12/24 11:43, Saurabh Sengar wrote:
> refresh_zone_stat_thresholds function has two loops which is expensive for
> higher number of CPUs and NUMA nodes.
>
> Below is the rough estimation of total iterations done by these loops
> based on number of NUMA and CPUs.
>
> Total number of iterations: nCPU * 2 * Numa * mCPU
> Where:
> nCPU = total number of CPUs
> Numa = total number of NUMA nodes
> mCPU = mean value of total CPUs (e.g., 512 for 1024 total CPUs)
>
> For the system under test with 16 NUMA nodes and 1024 CPUs, this
> results in a substantial increase in the number of loop iterations
> during boot-up when NUMA is enabled:
>
> No NUMA = 1024*2*1*512 = 1,048,576 : Here refresh_zone_stat_thresholds
> takes around 224 ms total for all the CPUs in the system under test.
> 16 NUMA = 1024*2*16*512 = 16,777,216 : Here refresh_zone_stat_thresholds
> takes around 4.5 seconds total for all the CPUs in the system under test.
>
> Calling this for each CPU is expensive when there are large number
> of CPUs along with multiple NUMAs. Fix this by deferring
> refresh_zone_stat_thresholds to be called later at once when all the
> secondary CPUs are up. Also, register the DYN hooks to keep the
> existing hotplug functionality intact.
>
> Signed-off-by: Saurabh Sengar <ssengar@...ux.microsoft.com>
> ---
> [V2]
> - Move vmstat_late_init_done under CONFIG_SMP to fix
> variable 'defined but not used' warning.
>
> mm/vmstat.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e2dc067a654..fa235c65c756 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1908,6 +1908,7 @@ static const struct seq_operations vmstat_op = {
> #ifdef CONFIG_SMP
> static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
> int sysctl_stat_interval __read_mostly = HZ;
> +static int vmstat_late_init_done;
>
> #ifdef CONFIG_PROC_FS
> static void refresh_vm_stats(struct work_struct *work)
> @@ -2110,7 +2111,8 @@ static void __init init_cpu_node_state(void)
>
> static int vmstat_cpu_online(unsigned int cpu)
> {
> - refresh_zone_stat_thresholds();
> + if (vmstat_late_init_done)
> + refresh_zone_stat_thresholds();
>
> if (!node_state(cpu_to_node(cpu), N_CPU)) {
> node_set_state(cpu_to_node(cpu), N_CPU);
> @@ -2142,6 +2144,14 @@ static int vmstat_cpu_dead(unsigned int cpu)
> return 0;
> }
>
> +static int __init vmstat_late_init(void)
> +{
> + refresh_zone_stat_thresholds();
> + vmstat_late_init_done = 1;
> +
> + return 0;
> +}
> +late_initcall(vmstat_late_init);> #endif
>
> struct workqueue_struct *mm_percpu_wq;
late_initcall() triggered vmstat_late_init() guaranteed to be called
before the last call into vmstat_cpu_online() during a normal boot ?
Otherwise refresh_zone_stat_thresholds() will never be called unless
there is a CPU online event later.
Powered by blists - more mailing lists