[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <235394aee595eaefd4bd442d00201ab44cd47de1.camel@intel.com>
Date: Thu, 22 Aug 2024 01:48:41 +0000
From: "Liu, Yujie" <yujie.liu@...el.com>
To: "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>
CC: "raghavendra.kt@....com" <raghavendra.kt@....com>, "Chen, Yu C"
<yu.c.chen@...el.com>, "peterz@...radead.org" <peterz@...radead.org>, "Chen,
Tim C" <tim.c.chen@...el.com>, "mingo@...hat.com" <mingo@...hat.com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"yu.chen.surf@...il.com" <yu.chen.surf@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Zhou,
Xiaoping" <xiaoping.zhou@...el.com>, "juri.lelli@...hat.com"
<juri.lelli@...hat.com>
Subject: Re: [PATCH v2] sched/numa: Fix the vma scan starving issue
On Wed, 2024-08-21 at 11:04 +0100, Mel Gorman wrote:
> On Mon, Aug 05, 2024 at 04:22:28PM +0800, Yujie Liu wrote:
> > Problem statement:
> > Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic"), the
> > Numa vma scan overhead has been reduced a lot. Meanwhile, the reducing of
> > the vma scan might create less Numa page fault information. The
> > insufficient information makes it harder for the Numa balancer to make
> > decision. Later, commit b7a5b537c55c08 ("sched/numa: Complete scanning of
> > partial VMAs regardless of PID activity") and commit 84db47ca7146d7
> > ("sched/numa: Fix mm numa_scan_seq based unconditional scan") are found
> > to bring back part of the performance.
> >
> > Recently when running SPECcpu omnetpp_r on a 320 CPUs/2 Sockets system,
> > a long duration of remote Numa node read was observed by PMU events:
> > A few cores having ~500MB/s remote memory access for ~20 seconds.
> > It causes high core-to-core variance and performance penalty. After the
> > investigation, it is found that many vmas are skipped due to the active
> > PID check. According to the trace events, in most cases, vma_is_accessed()
> > returns false because the history access info stored in pids_active
> > array has been cleared.
> >
> > Proposal:
> > The main idea is to adjust vma_is_accessed() to let it return true easier.
> > Thus compare the diff between mm->numa_scan_seq and
> > vma->numab_state->prev_scan_seq. If the diff has exceeded the threshold,
> > scan the vma.
> >
> > This patch especially helps the cases where there are small number of
> > threads, like the process-based SPECcpu. Without this patch, if the
> > SPECcpu process access the vma at the beginning, then sleeps for a long
> > time, the pid_active array will be cleared. A a result, if this process
> > is woken up again, it never has a chance to set prot_none anymore.
> > Because only the first 2 times of access is granted for vma scan:
> > (current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2
> > to be worse, no other threads within the task can help set the prot_none.
> > This causes information lost.
> >
> > Raghavendra helped test current patch and got the positive result
> > on the AMD platform:
> >
> > autonumabench NUMA01
> > base patched
> > Amean syst-NUMA01 194.05 ( 0.00%) 165.11 * 14.92%*
> > Amean elsp-NUMA01 324.86 ( 0.00%) 315.58 * 2.86%*
> >
> > Duration User 380345.36 368252.04
> > Duration System 1358.89 1156.23
> > Duration Elapsed 2277.45 2213.25
> >
> > autonumabench NUMA02
> >
> > Amean syst-NUMA02 1.12 ( 0.00%) 1.09 * 2.93%*
> > Amean elsp-NUMA02 3.50 ( 0.00%) 3.56 * -1.84%*
> >
> > Duration User 1513.23 1575.48
> > Duration System 8.33 8.13
> > Duration Elapsed 28.59 29.71
> >
> > kernbench
> >
> > Amean user-256 22935.42 ( 0.00%) 22535.19 * 1.75%*
> > Amean syst-256 7284.16 ( 0.00%) 7608.72 * -4.46%*
> > Amean elsp-256 159.01 ( 0.00%) 158.17 * 0.53%*
> >
> > Duration User 68816.41 67615.74
> > Duration System 21873.94 22848.08
> > Duration Elapsed 506.66 504.55
> >
> > Intel 256 CPUs/2 Sockets:
> > autonuma benchmark also shows improvements:
> >
> > v6.10-rc5 v6.10-rc5
> > +patch
> > Amean syst-NUMA01 245.85 ( 0.00%) 230.84 * 6.11%*
> > Amean syst-NUMA01_THREADLOCAL 205.27 ( 0.00%) 191.86 * 6.53%*
> > Amean syst-NUMA02 18.57 ( 0.00%) 18.09 * 2.58%*
> > Amean syst-NUMA02_SMT 2.63 ( 0.00%) 2.54 * 3.47%*
> > Amean elsp-NUMA01 517.17 ( 0.00%) 526.34 * -1.77%*
> > Amean elsp-NUMA01_THREADLOCAL 99.92 ( 0.00%) 100.59 * -0.67%*
> > Amean elsp-NUMA02 15.81 ( 0.00%) 15.72 * 0.59%*
> > Amean elsp-NUMA02_SMT 13.23 ( 0.00%) 12.89 * 2.53%*
> >
> > v6.10-rc5 v6.10-rc5
> > +patch
> > Duration User 1064010.16 1075416.23
> > Duration System 3307.64 3104.66
> > Duration Elapsed 4537.54 4604.73
> >
> > The SPECcpu remote node access issue disappears with the patch applied.
> >
> > Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
> > Reported-by: Xiaoping Zhou <xiaoping.zhou@...el.com>
> > Reviewed-and-tested-by: Raghavendra K T <raghavendra.kt@....com>
> > Co-developed-by: Chen Yu <yu.c.chen@...el.com>
> > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> > Signed-off-by: Yujie Liu <yujie.liu@...el.com>
>
> Ok, I didn't exactly replicate the autonuma test results but then again,
> I'd be a little surprised it was affected by this issue. The rescan
> decision is a bit arbitrary but I see no obviously better alternative
> and the patch is fixing an important corner case so
>
> Acked-by: Mel Gorman <mgorman@...hsingularity.net>
>
> Sorry for the long delay in reviewing, my backlog for upstream work is
> insane :(
Thanks a lot for your time to review this patch.
Best Regards,
Yujie
Powered by blists - more mailing lists