linux-kernel - Re: [PATCH v2] sched/numa: Fix the vma scan starving issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZsaWEDUHjIqktSz2@chenyu5-mobl2>
Date: Thu, 22 Aug 2024 09:36:16 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Mel Gorman <mgorman@...hsingularity.net>
CC: Yujie Liu <yujie.liu@...el.com>, Raghavendra K T <raghavendra.kt@....com>,
	Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, "Juri
 Lelli" <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
	Chen Yu <yu.chen.surf@...il.com>, Tim Chen <tim.c.chen@...el.com>,
	<linux-kernel@...r.kernel.org>, Xiaoping Zhou <xiaoping.zhou@...el.com>
Subject: Re: [PATCH v2] sched/numa: Fix the vma scan starving issue

On 2024-08-21 at 11:04:14 +0100, Mel Gorman wrote:
> On Mon, Aug 05, 2024 at 04:22:28PM +0800, Yujie Liu wrote:
> > Problem statement:
> > Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic"), the
> > Numa vma scan overhead has been reduced a lot. Meanwhile, the reducing of
> > the vma scan might create less Numa page fault information. The
> > insufficient information makes it harder for the Numa balancer to make
> > decision. Later, commit b7a5b537c55c08 ("sched/numa: Complete scanning of
> > partial VMAs regardless of PID activity") and commit 84db47ca7146d7
> > ("sched/numa: Fix mm numa_scan_seq based unconditional scan") are found
> > to bring back part of the performance.
> > 
> > Recently when running SPECcpu omnetpp_r on a 320 CPUs/2 Sockets system,
> > a long duration of remote Numa node read was observed by PMU events:
> > A few cores having ~500MB/s remote memory access for ~20 seconds.
> > It causes high core-to-core variance and performance penalty. After the
> > investigation, it is found that many vmas are skipped due to the active
> > PID check. According to the trace events, in most cases, vma_is_accessed()
> > returns false because the history access info stored in pids_active
> > array has been cleared.
> > 
> > Proposal:
> > The main idea is to adjust vma_is_accessed() to let it return true easier.
> > Thus compare the diff between mm->numa_scan_seq and
> > vma->numab_state->prev_scan_seq. If the diff has exceeded the threshold,
> > scan the vma.
> > 
> > This patch especially helps the cases where there are small number of
> > threads, like the process-based SPECcpu. Without this patch, if the
> > SPECcpu process access the vma at the beginning, then sleeps for a long
> > time, the pid_active array will be cleared. A a result, if this process
> > is woken up again, it never has a chance to set prot_none anymore.
> > Because only the first 2 times of access is granted for vma scan:
> > (current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2
> > to be worse, no other threads within the task can help set the prot_none.
> > This causes information lost.
> > 
> > Raghavendra helped test current patch and got the positive result
> > on the AMD platform:
> > 
> > autonumabench NUMA01
> >                             base                  patched
> > Amean     syst-NUMA01      194.05 (   0.00%)      165.11 *  14.92%*
> > Amean     elsp-NUMA01      324.86 (   0.00%)      315.58 *   2.86%*
> > 
> > Duration User      380345.36   368252.04
> > Duration System      1358.89     1156.23
> > Duration Elapsed     2277.45     2213.25
> > 
> > autonumabench NUMA02
> > 
> > Amean     syst-NUMA02        1.12 (   0.00%)        1.09 *   2.93%*
> > Amean     elsp-NUMA02        3.50 (   0.00%)        3.56 *  -1.84%*
> > 
> > Duration User        1513.23     1575.48
> > Duration System         8.33        8.13
> > Duration Elapsed       28.59       29.71
> > 
> > kernbench
> > 
> > Amean     user-256    22935.42 (   0.00%)    22535.19 *   1.75%*
> > Amean     syst-256     7284.16 (   0.00%)     7608.72 *  -4.46%*
> > Amean     elsp-256      159.01 (   0.00%)      158.17 *   0.53%*
> > 
> > Duration User       68816.41    67615.74
> > Duration System     21873.94    22848.08
> > Duration Elapsed      506.66      504.55
> > 
> > Intel 256 CPUs/2 Sockets:
> > autonuma benchmark also shows improvements:
> > 
> >                                                v6.10-rc5              v6.10-rc5
> >                                                                          +patch
> > Amean     syst-NUMA01                  245.85 (   0.00%)      230.84 *   6.11%*
> > Amean     syst-NUMA01_THREADLOCAL      205.27 (   0.00%)      191.86 *   6.53%*
> > Amean     syst-NUMA02                   18.57 (   0.00%)       18.09 *   2.58%*
> > Amean     syst-NUMA02_SMT                2.63 (   0.00%)        2.54 *   3.47%*
> > Amean     elsp-NUMA01                  517.17 (   0.00%)      526.34 *  -1.77%*
> > Amean     elsp-NUMA01_THREADLOCAL       99.92 (   0.00%)      100.59 *  -0.67%*
> > Amean     elsp-NUMA02                   15.81 (   0.00%)       15.72 *   0.59%*
> > Amean     elsp-NUMA02_SMT               13.23 (   0.00%)       12.89 *   2.53%*
> > 
> >                    v6.10-rc5   v6.10-rc5
> >                                   +patch
> > Duration User     1064010.16  1075416.23
> > Duration System      3307.64     3104.66
> > Duration Elapsed     4537.54     4604.73
> > 
> > The SPECcpu remote node access issue disappears with the patch applied.
> > 
> > Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
> > Reported-by: Xiaoping Zhou <xiaoping.zhou@...el.com>
> > Reviewed-and-tested-by: Raghavendra K T <raghavendra.kt@....com>
> > Co-developed-by: Chen Yu <yu.c.chen@...el.com>
> > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> > Signed-off-by: Yujie Liu <yujie.liu@...el.com>
> 
> Ok, I didn't exactly replicate the autonuma test results but then again,
> I'd be a little surprised it was affected by this issue. The rescan
> decision is a bit arbitrary but I see no obviously better alternative
> and the patch is fixing an important corner case so
> 
> Acked-by: Mel Gorman <mgorman@...hsingularity.net>
> 
> Sorry for the long delay in reviewing, my backlog for upstream work is
> insane :(
>

Thank you Mel for your time to review this patch.

thanks,
Chenyu