[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5s55bgrmpomlpefmvt4bz7t2myvjnbw6lnvtsnbkdphwfb7zdo@tnm7flx5jidu>
Date: Wed, 21 Aug 2024 11:04:14 +0100
From: Mel Gorman <mgorman@...hsingularity.net>
To: Yujie Liu <yujie.liu@...el.com>
Cc: Raghavendra K T <raghavendra.kt@....com>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Chen Yu <yu.chen.surf@...il.com>, Tim Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org,
Xiaoping Zhou <xiaoping.zhou@...el.com>, Chen Yu <yu.c.chen@...el.com>
Subject: Re: [PATCH v2] sched/numa: Fix the vma scan starving issue
On Mon, Aug 05, 2024 at 04:22:28PM +0800, Yujie Liu wrote:
> Problem statement:
> Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic"), the
> Numa vma scan overhead has been reduced a lot. Meanwhile, the reducing of
> the vma scan might create less Numa page fault information. The
> insufficient information makes it harder for the Numa balancer to make
> decision. Later, commit b7a5b537c55c08 ("sched/numa: Complete scanning of
> partial VMAs regardless of PID activity") and commit 84db47ca7146d7
> ("sched/numa: Fix mm numa_scan_seq based unconditional scan") are found
> to bring back part of the performance.
>
> Recently when running SPECcpu omnetpp_r on a 320 CPUs/2 Sockets system,
> a long duration of remote Numa node read was observed by PMU events:
> A few cores having ~500MB/s remote memory access for ~20 seconds.
> It causes high core-to-core variance and performance penalty. After the
> investigation, it is found that many vmas are skipped due to the active
> PID check. According to the trace events, in most cases, vma_is_accessed()
> returns false because the history access info stored in pids_active
> array has been cleared.
>
> Proposal:
> The main idea is to adjust vma_is_accessed() to let it return true easier.
> Thus compare the diff between mm->numa_scan_seq and
> vma->numab_state->prev_scan_seq. If the diff has exceeded the threshold,
> scan the vma.
>
> This patch especially helps the cases where there are small number of
> threads, like the process-based SPECcpu. Without this patch, if the
> SPECcpu process access the vma at the beginning, then sleeps for a long
> time, the pid_active array will be cleared. A a result, if this process
> is woken up again, it never has a chance to set prot_none anymore.
> Because only the first 2 times of access is granted for vma scan:
> (current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2
> to be worse, no other threads within the task can help set the prot_none.
> This causes information lost.
>
> Raghavendra helped test current patch and got the positive result
> on the AMD platform:
>
> autonumabench NUMA01
> base patched
> Amean syst-NUMA01 194.05 ( 0.00%) 165.11 * 14.92%*
> Amean elsp-NUMA01 324.86 ( 0.00%) 315.58 * 2.86%*
>
> Duration User 380345.36 368252.04
> Duration System 1358.89 1156.23
> Duration Elapsed 2277.45 2213.25
>
> autonumabench NUMA02
>
> Amean syst-NUMA02 1.12 ( 0.00%) 1.09 * 2.93%*
> Amean elsp-NUMA02 3.50 ( 0.00%) 3.56 * -1.84%*
>
> Duration User 1513.23 1575.48
> Duration System 8.33 8.13
> Duration Elapsed 28.59 29.71
>
> kernbench
>
> Amean user-256 22935.42 ( 0.00%) 22535.19 * 1.75%*
> Amean syst-256 7284.16 ( 0.00%) 7608.72 * -4.46%*
> Amean elsp-256 159.01 ( 0.00%) 158.17 * 0.53%*
>
> Duration User 68816.41 67615.74
> Duration System 21873.94 22848.08
> Duration Elapsed 506.66 504.55
>
> Intel 256 CPUs/2 Sockets:
> autonuma benchmark also shows improvements:
>
> v6.10-rc5 v6.10-rc5
> +patch
> Amean syst-NUMA01 245.85 ( 0.00%) 230.84 * 6.11%*
> Amean syst-NUMA01_THREADLOCAL 205.27 ( 0.00%) 191.86 * 6.53%*
> Amean syst-NUMA02 18.57 ( 0.00%) 18.09 * 2.58%*
> Amean syst-NUMA02_SMT 2.63 ( 0.00%) 2.54 * 3.47%*
> Amean elsp-NUMA01 517.17 ( 0.00%) 526.34 * -1.77%*
> Amean elsp-NUMA01_THREADLOCAL 99.92 ( 0.00%) 100.59 * -0.67%*
> Amean elsp-NUMA02 15.81 ( 0.00%) 15.72 * 0.59%*
> Amean elsp-NUMA02_SMT 13.23 ( 0.00%) 12.89 * 2.53%*
>
> v6.10-rc5 v6.10-rc5
> +patch
> Duration User 1064010.16 1075416.23
> Duration System 3307.64 3104.66
> Duration Elapsed 4537.54 4604.73
>
> The SPECcpu remote node access issue disappears with the patch applied.
>
> Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
> Reported-by: Xiaoping Zhou <xiaoping.zhou@...el.com>
> Reviewed-and-tested-by: Raghavendra K T <raghavendra.kt@....com>
> Co-developed-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Yujie Liu <yujie.liu@...el.com>
Ok, I didn't exactly replicate the autonuma test results but then again,
I'd be a little surprised it was affected by this issue. The rescan
decision is a bit arbitrary but I see no obviously better alternative
and the patch is fixing an important corner case so
Acked-by: Mel Gorman <mgorman@...hsingularity.net>
Sorry for the long delay in reviewing, my backlog for upstream work is
insane :(
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists