[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnrSIGKBpyeTmSJt@chenyu5-mobl2>
Date: Tue, 25 Jun 2024 22:20:16 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Raghavendra K T <raghavendra.kt@....com>
CC: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Mel Gorman
<mgorman@...e.de>, Andrew Morton <akpm@...ux-foundation.org>, "David
Hildenbrand" <david@...hat.com>, <rppt@...nel.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Bharata B Rao <bharata@....com>, Johannes Weiner <jweiner@...com>, "kernel
test robot" <oliver.sang@...el.com>, Yujie Liu <yujie.liu@...el.com>
Subject: Re: [RFC PATCH 1 1/1] sched/numa: Hot VMA and shared VMA optimization
Hi Raghavendra,
On 2024-03-22 at 19:11:12 +0530, Raghavendra K T wrote:
> Optimizations are based on history of PIDs accessing VMA.
>
> - Increase tasks' access history windows (PeterZ) from 2 to 4.
> ( This patch is from Peter Zijlstra <peterz@...radead.org>)
>
> Idea: A task is allowed to scan a VMA if:
> - VMA was very recently accessed as indicated by the latest
> access PIDs information (hot VMA).
> - VMA is shared by more than 2 tasks. Here whole history of VMA's
> access PIDs is considered using bitmap_weight().
>
> Signed-off-by: Raghavendra K T <raghavendra.kt@....com>
> ---
> I will split the patset and post if we find this pathset useful
> going further. First patch is from PeterZ.
>
This is a good direction I think. We did an initial test using autonumabench
THREADLOCAL on a 240 CPUs 2 nodes system. It seems that this patch does not
show obvious difference, but it shows a more stable result(less run-to-run
variance). We'll enable the Sub-Numa-Cluster to see if there is any difference.
My understanding is that, if we can extend the NR_ACCESS_PID_HIST further,
the THREADLOCAL could see more benefits, as each thread has its own VMA. Or maybe
make the length of VMA access history adaptive(rather than a fixed 4) could be
more flexible.
numa_scan_orig numa_scan_4_history
Min syst-NUMA01_THREADLOCAL 388.47 ( 0.00%) 397.43 ( -2.31%)
Min elsp-NUMA01_THREADLOCAL 40.27 ( 0.00%) 38.94 ( 3.30%)
Amean syst-NUMA01_THREADLOCAL 467.62 ( 0.00%) 459.10 ( 1.82%)
Amean elsp-NUMA01_THREADLOCAL 42.20 ( 0.00%) 44.84 ( -6.26%)
Stddev syst-NUMA01_THREADLOCAL 74.11 ( 0.00%) 60.90 ( 17.81%)
CoeffVar syst-NUMA01_THREADLOCAL 15.85 ( 0.00%) 13.27 ( 16.29%)
Max syst-NUMA01_THREADLOCAL 535.36 ( 0.00%) 519.21 ( 3.02%)
Max elsp-NUMA01_THREADLOCAL 43.96 ( 0.00%) 56.33 ( -28.14%)
BAmean-50 syst-NUMA01_THREADLOCAL 388.47 ( 0.00%) 397.43 ( -2.31%)
BAmean-50 elsp-NUMA01_THREADLOCAL 40.27 ( 0.00%) 38.94 ( 3.30%)
BAmean-95 syst-NUMA01_THREADLOCAL 433.75 ( 0.00%) 429.05 ( 1.08%)
BAmean-95 elsp-NUMA01_THREADLOCAL 41.31 ( 0.00%) 39.09 ( 5.39%)
BAmean-99 syst-NUMA01_THREADLOCAL 433.75 ( 0.00%) 429.05 ( 1.08%)
BAmean-99 elsp-NUMA01_THREADLOCAL 41.31 ( 0.00%) 39.09 ( 5.39%)
thanks,
Chenyu
Powered by blists - more mailing lists