[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200730172240.GE14603@linux.vnet.ibm.com>
Date: Thu, 30 Jul 2020 22:52:40 +0530
From: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>
Cc: linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
LKML <linux-kernel@...r.kernel.org>,
Michael Ellerman <michaele@....ibm.com>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <valentin.schneider@....com>,
Nick Piggin <npiggin@....ibm.com>,
Oliver OHalloran <oliveroh@....ibm.com>,
Nathan Lynch <nathanl@...ux.ibm.com>,
Michael Neuling <mikey@...ux.ibm.com>,
Anton Blanchard <anton@....ibm.com>,
Gautham R Shenoy <ego@...ux.vnet.ibm.com>,
Vaidyanathan Srinivasan <svaidy@...ux.ibm.com>,
Jordan Niethe <jniethe5@...il.com>
Subject: Re: [PATCH v4 00/10] Coregroup support on Powerpc
* Srikar Dronamraju <srikar@...ux.vnet.ibm.com> [2020-07-27 11:02:20]:
> Changelog v3 ->v4:
> v3: https://lore.kernel.org/lkml/20200723085116.4731-1-srikar@linux.vnet.ibm.com/t/#u
>
Here is a summary of some of the testing done with coregroup v4 patchsets.
It includes ebizzy, schbench, perf bench sched pipe and topology verification.
One the left side are results from powerpc/next tree and on the right are the
results with the patchset applied. Topological verification clearly shows that
there is no change in topology with and without the patches on all the 3 class
of systems that were tested.
On PowerPc/Next On Powerpc/next + Coregroup Support v4 patchset
Power 9 PowerNV (2 Node/ 160 Cpu System)
---------------------------------
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
N Min Max Median Avg Stddev N Min Max Median Avg Stddev
100 993884 1276090 1173476 1165914 54867.201 100 910470 1279820 1171095 1162091 67363.28
schbench (latency hence lower is better)
Latency percentiles (usec) Latency percentiles (usec)
50.0th: 455 50.0th: 454
75.0th: 533 75.0th: 543
90.0th: 683 90.0th: 701
95.0th: 743 95.0th: 737
*99.0th: 815 *99.0th: 805
99.5th: 839 99.5th: 835
99.9th: 913 99.9th: 893
min=0, max=1011 min=0, max=2833
perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark: # Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes # Executed 1000000 pipe operations between two processes
Total time: 6.083 [sec] Total time: 6.303 [sec]
6.083576 usecs/op 6.303318 usecs/op
164377 ops/sec 158646 ops/sec
Power 9 LPAR (2 Node/ 128 Cpu System)
---------------------------------
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
N Min Max Median Avg Stddev N Min Max Median Avg Stddev
100 1058029 1295393 1200414 1188306.7 56786.538 100 943264 1287619 1180522 1168473.2 64469.955
schbench (latency hence lower is better)
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 34 50.0000th: 39
75.0000th: 46 75.0000th: 52
90.0000th: 53 90.0000th: 68
95.0000th: 56 95.0000th: 77
*99.0000th: 61 *99.0000th: 89
99.5000th: 63 99.5000th: 94
99.9000th: 81 99.9000th: 169
min=0, max=8405 min=0, max=23674
perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark: # Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes # Executed 1000000 pipe operations between two processes
Total time: 8.768 [sec] Total time: 5.217 [sec]
8.768400 usecs/op 5.217625 usecs/op
114045 ops/sec 191658 ops/sec
Power 8 LPAR (8 Node/ 256 Cpu System)
---------------------------------
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
N Min Max Median Avg Stddev N Min Max Median Avg Stddev
100 1267615 1965234 1707423 1689137.6 144363.29 100 1175357 1924262 1691104 1664792.1 145876.4
schbench (latency hence lower is better)
Latency percentiles (usec) Latency percentiles (usec)
50.0th: 37 50.0th: 36
75.0th: 51 75.0th: 48
90.0th: 59 90.0th: 55
95.0th: 63 95.0th: 59
*99.0th: 71 *99.0th: 67
99.5th: 75 99.5th: 72
99.9th: 105 99.9th: 170
min=0, max=18560 min=0, max=27031
perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark: # Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes # Executed 1000000 pipe operations between two processes
Total time: 6.013 [sec] Total time: 5.930 [sec]
6.013963 usecs/op 5.930724 usecs/op
166279 ops/sec 168613 ops/sec
Topology verification on Power9
Power9/ PowerNV / SMT4
tail -f /proc/cpuinfo
---------------------
cpu : POWER9, altivec supported
clock : 3600.000000MHz
revision : 2.2 (pvr 004e 1202)
timebase : 512000000
platform : PowerNV
model : 9006-22P
machine : PowerNV 9006-22P
firmware : OPAL
MMU : Radix
On PowerPc/Next On Powerpc/next + Coregroup Support v4 patchset
lscpu lscpu
------ ------
Architecture: ppc64le Architecture: ppc64le
Byte Order: Little Endian Byte Order: Little Endian
CPU(s): 160 CPU(s): 160
On-line CPU(s) list: 0-159 On-line CPU(s) list: 0-159
Thread(s) per core: 4 Thread(s) per core: 4
Core(s) per socket: 20 Core(s) per socket: 20
Socket(s): 2 Socket(s): 2
NUMA node(s): 2 NUMA node(s): 2
Model: 2.2 (pvr 004e 1202) Model: 2.2 (pvr 004e 1202)
Model name: POWER9, altivec supported Model name: POWER9, altivec supported
CPU max MHz: 3800.0000 CPU max MHz: 3800.0000
CPU min MHz: 2166.0000 CPU min MHz: 2166.0000
L1d cache: 32K L1d cache: 32K
L1i cache: 32K L1i cache: 32K
L2 cache: 512K L2 cache: 512K
L3 cache: 10240K L3 cache: 10240K
NUMA node0 CPU(s): 0-79 NUMA node0 CPU(s): 0-79
NUMA node8 CPU(s): 80-159 NUMA node8 CPU(s): 80-159
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
----------------------------------------------------- -----------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT
/proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE
/proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE
/proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
------------------------------------------------------ ------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391
/proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327
/proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071
/proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801 /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801
On PowerPc/Next
head /proc/schedstat
--------------------
version 15
timestamp 4295043536
cpu0 0 0 0 0 0 0 9597119314 2408913694 11897
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 4941435230 11106132 1583
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
On Powerpc/next + Coregroup Support v4 patchset
head /proc/schedstat
--------------------
version 15
timestamp 4296311826
cpu0 0 0 0 0 0 0 3353674045024 3781680865826 297483
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 3337873293332 4231590033856 229090
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Post sudo ppc64_cpu --smt=1 Post sudo ppc64_cpu --smt=1
--------------------- ---------------------
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
----------------------------------------------------- -----------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE /proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE
/proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE /proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE
/proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA /proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
------------------------------------------------------ ------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327
/proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071
/proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801
On Powerpc/next
head /proc/schedstat
--------------------
version 15
timestamp 4295046242
cpu0 0 0 0 0 0 0 10978610020 2658997390 13068
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu4 0 0 0 0 0 0 5408663896 95701034 7697
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
On Powerpc/next + Coregroup Support v4 patchset
head /proc/schedstat
--------------------
version 15
timestamp 4296314905
cpu0 0 0 0 0 0 0 3355392013536 3781975150576 298723
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu4 0 0 0 0 0 0 3351637920996 4427329763050 256776
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Similar verification was done on Power 8 (8 Node 256 CPU LPAR) and Power 9 (2
node 128 Cpu LPAR) and they showed the topology before and after the patch to be
identical. If Interested, I could provide the same.
Powered by blists - more mailing lists