[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ab0a2c2-7dee-40b0-edd0-56a5b1915745@amd.com>
Date: Wed, 9 Feb 2022 10:40:15 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Valentin Schneider <valentin.schneider@....com>,
Aubrey Li <aubrey.li@...ux.intel.com>,
Barry Song <song.bao.hua@...ilicon.com>,
Mike Galbraith <efault@....de>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Gautham Shenoy <gautham.shenoy@....com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when
SD_NUMA spans multiple LLCs
Hello Mel,
On 2/8/2022 3:13 PM, Mel Gorman wrote:
[..snip..]
> On a Zen3 machine running STREAM parallelised with OMP to have on instance
> per LLC the results and without binding, the results are
>
> 5.17.0-rc0 5.17.0-rc0
> vanilla sched-numaimb-v6
> MB/sec copy-16 162596.94 ( 0.00%) 580559.74 ( 257.05%)
> MB/sec scale-16 136901.28 ( 0.00%) 374450.52 ( 173.52%)
> MB/sec add-16 157300.70 ( 0.00%) 564113.76 ( 258.62%)
> MB/sec triad-16 151446.88 ( 0.00%) 564304.24 ( 272.61%)
I was able to test STREAM without binding on different
NPS configurations of two socket Zen3 machine.
The results look good:
sched-tip - 5.17.0-rc1 tip sched/core
mel-v6 - 5.17.0-rc1 tip sched/core + this patch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stream with 16 threads.
built with -DSTREAM_ARRAY_SIZE=128000000, -DNTIMES=10
Zen3, 64C128T per socket, 2 sockets,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NPS1
Test: sched-tip mel-v6
Copy: 114470.18 (0.00 pct) 152806.94 (33.49 pct)
Scale: 111575.12 (0.00 pct) 189784.57 (70.09 pct)
Add: 125436.15 (0.00 pct) 213371.05 (70.10 pct)
Triad: 123068.86 (0.00 pct) 209809.11 (70.48 pct)
NPS2
Test: sched-tip mel-v6
Copy: 57936.28 (0.00 pct) 155038.70 (167.60 pct)
Scale: 55599.30 (0.00 pct) 192601.59 (246.41 pct)
Add: 63096.96 (0.00 pct) 211462.58 (235.13 pct)
Triad: 61983.39 (0.00 pct) 208909.34 (237.04 pct)
NPS4
Test: sched-tip mel-v6
Copy: 43946.42 (0.00 pct) 119583.69 (172.11 pct)
Scale: 33750.96 (0.00 pct) 180130.83 (433.70 pct)
Add: 39109.72 (0.00 pct) 170296.68 (335.43 pct)
Triad: 36598.88 (0.00 pct) 169953.47 (364.36 pct)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stream with 16 threads.
built with -DSTREAM_ARRAY_SIZE=128000000, -DNTIMES=100
Zen3, 64C128T per socket, 2 sockets,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NPS1
Test: sched-tip mel-v6
Copy: 132402.79 (0.00 pct) 225587.85 (70.37 pct)
Scale: 126923.02 (0.00 pct) 214363.58 (68.89 pct)
Add: 145596.55 (0.00 pct) 260901.92 (79.19 pct)
Triad: 143092.91 (0.00 pct) 249081.79 (74.06 pct)
NPS 2
Test: sched-tip mel-v6
Copy: 107386.27 (0.00 pct) 227623.31 (111.96 pct)
Scale: 100941.44 (0.00 pct) 218116.63 (116.08 pct)
Add: 115854.52 (0.00 pct) 272756.95 (135.43 pct)
Triad: 113369.96 (0.00 pct) 260235.32 (129.54 pct)
NPS4
Test: sched-tip mel-v6
Copy: 91083.07 (0.00 pct) 247163.90 (171.36 pct)
Scale: 90352.54 (0.00 pct) 223914.31 (147.82 pct)
Add: 101973.98 (0.00 pct) 272842.42 (167.56 pct)
Triad: 99773.65 (0.00 pct) 258904.54 (159.49 pct)
There is a significant improvement throughout the board
with v6 outperforming tip/sched/core in every case!
Tested-by: K Prateek Nayak <kprateek.nayak@....com>
--
Thanks and Regards
Prateek
Powered by blists - more mailing lists