[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d686347e-f76c-7f9c-3f1a-f4326f5167ca@amd.com>
Date: Wed, 5 Jul 2023 12:34:48 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Tejun Heo <tj@...nel.org>
Cc: Sandeep Dhavale <dhavale@...gle.com>, jiangshanlai@...il.com,
torvalds@...ux-foundation.org, peterz@...radead.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com,
joshdon@...gle.com, brho@...gle.com, briannorris@...omium.org,
nhuck@...gle.com, agk@...hat.com, snitzer@...nel.org,
void@...ifault.com, kernel-team@...roid.com
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods
Hello Tejun,
On 6/9/2023 4:20 AM, Tejun Heo wrote:
> [..snip..]
>
> Can you please test the following branch? It should have
> both bugs fixed properly.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git affinity-scopes-v2
>
> If that doesn't crash, I'd love to hear how it affects the perf regressions
> reported over that past few months.
Sorry about the delay. I'll leave the detailed results of the testing below,
results are from a dual socket 3rd Generation EPYC system (2 x 64C/128T)
tl;dr
- Apart from tbench and netperf, the rest of the benchmarks show no
difference out of the box.
- SPECjbb2015 Multi-jVM sees small uplift to max-jOPS with certain
affinity scopes.
- tbench and netperf seem to be unhappy throughout. None of the affinity
scopes seem to bring back the performance. I'll dig more into this.
Following are the results from running standard benchmarks on a
dual socket Zen3 (2 x 64C/128T) machine configured in different
NPS modes.
NPS Modes are used to logically divide single socket into
multiple NUMA region.
Following is the NUMA configuration for each NPS mode on the system:
NPS1: Each socket is a NUMA node.
Total 2 NUMA nodes in the dual socket machine.
Node 0: 0-63, 128-191
Node 1: 64-127, 192-255
NPS2: Each socket is further logically divided into 2 NUMA regions.
Total 4 NUMA nodes exist over 2 socket.
Node 0: 0-31, 128-159
Node 1: 32-63, 160-191
Node 2: 64-95, 192-223
Node 3: 96-127, 223-255
NPS4: Each socket is logically divided into 4 NUMA regions.
Total 8 NUMA nodes exist over 2 socket.
Node 0: 0-15, 128-143
Node 1: 16-31, 144-159
Node 2: 32-47, 160-175
Node 3: 48-63, 176-191
Node 4: 64-79, 192-207
Node 5: 80-95, 208-223
Node 6: 96-111, 223-231
Node 7: 112-127, 232-255
Benchmark Results:
Kernel versions:
- base: affinity-scopes-v2 branch at
commit 18c8ae813156 ("workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0")
- affinity_scopes: affinity-scopes-v2 branch at
commit a4da9f618d3e ("workqueue: Add "Affinity Scopes and Performance" section to documentation")
running with the default affinity scope.
~~~~~~~~~~~~~
~ hackbench ~
~~~~~~~~~~~~~
o NPS1
Test: base affinity_scopes
1-groups: 0.00 (0.00 pct) 3.68 (0.00 pct)
2-groups: 4.41 (0.00 pct) 4.40 (0.22 pct)
4-groups: 4.91 (0.00 pct) 4.87 (0.81 pct)
8-groups: 5.64 (0.00 pct) 5.74 (-1.77 pct)
16-groups: 7.72 (0.00 pct) 7.54 (2.33 pct)
o NPS2
Test: base affinity_scopes
1-groups: 3.74 (0.00 pct) 3.85 (-2.94 pct)
2-groups: 4.38 (0.00 pct) 4.34 (0.91 pct)
4-groups: 4.87 (0.00 pct) 4.80 (1.43 pct)
8-groups: 5.42 (0.00 pct) 5.40 (0.36 pct)
16-groups: 6.75 (0.00 pct) 7.02 (-4.00 pct)
o NPS4
Test: base affinity_scopes
1-groups: 3.90 (0.00 pct) 3.84 (1.53 pct)
2-groups: 4.40 (0.00 pct) 4.39 (0.22 pct)
4-groups: 4.86 (0.00 pct) 4.85 (0.20 pct)
8-groups: 5.44 (0.00 pct) 5.44 (0.00 pct)
16-groups: 7.20 (0.00 pct) 7.08 (1.66 pct)
~~~~~~~~~~~~
~ schbench ~
~~~~~~~~~~~~
o NPS1
#workers: base affinity_scopes
1: 26.00 (0.00 pct) 26.00 (0.00 pct)
2: 26.00 (0.00 pct) 28.00 (-7.69 pct)
4: 31.00 (0.00 pct) 28.00 (9.67 pct)
8: 37.00 (0.00 pct) 37.00 (0.00 pct)
16: 49.00 (0.00 pct) 47.00 (4.08 pct)
32: 78.00 (0.00 pct) 81.00 (-3.84 pct)
64: 170.00 (0.00 pct) 173.00 (-1.76 pct)
128: 369.00 (0.00 pct) 344.00 (6.77 pct)
256: 49600.00 (0.00 pct) 48704.00 (1.80 pct)
512: 93568.00 (0.00 pct) 93824.00 (-0.27 pct)
o NPS2
#workers: base affinity_scopes
1: 24.00 (0.00 pct) 23.00 (4.16 pct)
2: 29.00 (0.00 pct) 25.00 (13.79 pct)
4: 31.00 (0.00 pct) 32.00 (-3.22 pct)
8: 43.00 (0.00 pct) 39.00 (9.30 pct)
16: 52.00 (0.00 pct) 52.00 (0.00 pct)
32: 82.00 (0.00 pct) 89.00 (-8.53 pct)
64: 179.00 (0.00 pct) 154.00 (13.96 pct)
128: 400.00 (0.00 pct) 360.00 (10.00 pct)
256: 49856.00 (0.00 pct) 48576.00 (2.56 pct)
512: 93056.00 (0.00 pct) 91520.00 (1.65 pct)
o NPS4
#workers: base affinity_scopes
1: 25.00 (0.00 pct) 22.00 (12.00 pct)
2: 26.00 (0.00 pct) 27.00 (-3.84 pct)
4: 29.00 (0.00 pct) 28.00 (3.44 pct)
8: 48.00 (0.00 pct) 44.00 (8.33 pct)
16: 55.00 (0.00 pct) 59.00 (-7.27 pct)
32: 88.00 (0.00 pct) 84.00 (4.54 pct)
64: 166.00 (0.00 pct) 173.00 (-4.21 pct)
128: 374.00 (0.00 pct) 368.00 (1.60 pct)
256: 49600.00 (0.00 pct) 49856.00 (-0.51 pct)
512: 93824.00 (0.00 pct) 93568.00 (0.27 pct)
~~~~~~~~~~
~ tbench ~
~~~~~~~~~~
o NPS1
Clients: base affinity_scopes
1 450.40 (0.00 pct) 456.71 (1.40 pct)
2 872.50 (0.00 pct) 882.38 (1.13 pct)
4 1630.13 (0.00 pct) 1605.48 (-1.51 pct)
8 3139.90 (0.00 pct) 3041.39 (-3.13 pct)
16 6113.51 (0.00 pct) 5449.58 (-10.86 pct)
32 11024.64 (0.00 pct) 9147.71 (-17.02 pct)
64 19081.96 (0.00 pct) 14843.46 (-22.21 pct)
128 30956.07 (0.00 pct) 27493.35 (-11.18 pct)
256 42829.46 (0.00 pct) 36913.54 (-13.81 pct)
512 42395.69 (0.00 pct) 36165.41 (-14.69 pct)
1024 41973.51 (0.00 pct) 38530.57 (-8.20 pct)
o NPS2
Clients: base affinity_scopes
1 451.37 (0.00 pct) 450.97 (-0.08 pct)
2 875.07 (0.00 pct) 874.08 (-0.11 pct)
4 1636.31 (0.00 pct) 1639.60 (0.20 pct)
8 3162.48 (0.00 pct) 3074.73 (-2.77 pct)
16 5794.93 (0.00 pct) 5502.22 (-5.05 pct)
32 11205.26 (0.00 pct) 8979.27 (-19.86 pct)
64 20770.79 (0.00 pct) 17151.10 (-17.42 pct)
128 30485.82 (0.00 pct) 26953.16 (-11.58 pct)
256 40161.93 (0.00 pct) 35892.11 (-10.63 pct)
512 44513.43 (0.00 pct) 38876.31 (-12.66 pct)
1024 42781.13 (0.00 pct) 38313.23 (-10.44 pct)
o NPS4
Clients: base affinity_scopes
1 451.25 (0.00 pct) 447.95 (-0.73 pct)
2 877.94 (0.00 pct) 877.93 (0.00 pct)
4 1641.74 (0.00 pct) 1653.17 (0.69 pct)
8 3140.87 (0.00 pct) 3050.94 (-2.86 pct)
16 5878.87 (0.00 pct) 5291.66 (-9.98 pct)
32 10910.11 (0.00 pct) 9745.45 (-10.67 pct)
64 18814.62 (0.00 pct) 16708.46 (-11.19 pct)
128 29238.49 (0.00 pct) 27598.00 (-5.61 pct)
256 42119.54 (0.00 pct) 38464.91 (-8.67 pct)
512 41645.81 (0.00 pct) 40330.03 (-3.15 pct)
1024 41977.06 (0.00 pct) 39540.55 (-5.80 pct)
~~~~~~~~~~
~ stream ~
~~~~~~~~~~
o NPS1
- 10 Runs:
Test: base affinity_scopes
Copy: 245676.59 (0.00 pct) 333646.71 (35.80 pct)
Scale: 206545.41 (0.00 pct) 205706.04 (-0.40 pct)
Add: 213506.82 (0.00 pct) 236739.07 (10.88 pct)
Triad: 217679.43 (0.00 pct) 249263.46 (14.50 pct)
- 100 Runs:
Test: base affinity_scopes
Copy: 318060.91 (0.00 pct) 326025.89 (2.50 pct)
Scale: 213943.40 (0.00 pct) 207647.37 (-2.94 pct)
Add: 237892.53 (0.00 pct) 232164.59 (-2.40 pct)
Triad: 245672.84 (0.00 pct) 246333.21 (0.26 pct)
o NPS2
- 10 Runs:
Test: base affinity_scopes
Copy: 296632.20 (0.00 pct) 291153.63 (-1.84 pct)
Scale: 206193.90 (0.00 pct) 216368.42 (4.93 pct)
Add: 240363.50 (0.00 pct) 245954.23 (2.32 pct)
Triad: 242748.60 (0.00 pct) 238606.20 (-1.70 pct)
- 100 Runs:
Test: base affinity_scopes
Copy: 322535.79 (0.00 pct) 315020.03 (-2.33 pct)
Scale: 217723.56 (0.00 pct) 220172.32 (1.12 pct)
Add: 248117.72 (0.00 pct) 250557.17 (0.98 pct)
Triad: 257768.66 (0.00 pct) 248264.00 (-3.68 pct)
o NPS4
- 10 Runs:
Test: base affinity_scopes
Copy: 274067.54 (0.00 pct) 302804.77 (10.48 pct)
Scale: 224944.53 (0.00 pct) 230112.39 (2.29 pct)
Add: 229318.09 (0.00 pct) 241939.54 (5.50 pct)
Triad: 230175.89 (0.00 pct) 253613.85 (10.18 pct)
- 100 Runs:
Test: base affinity_scopes
Copy: 338922.96 (0.00 pct) 348183.65 (2.73 pct)
Scale: 240262.45 (0.00 pct) 245939.67 (2.36 pct)
Add: 256968.24 (0.00 pct) 260657.01 (1.43 pct)
Triad: 262644.16 (0.00 pct) 262286.46 (-0.13 pct)
~~~~~~~~~~~
~ netperf ~
~~~~~~~~~~~
o NPS1
Test: base affinity_scopes
1-clients: 100910.82 (0.00 pct) 102553.83 (1.62 pct)
2-clients: 99777.76 (0.00 pct) 99390.14 (-0.38 pct)
4-clients: 97676.17 (0.00 pct) 95856.63 (-1.86 pct)
8-clients: 95413.11 (0.00 pct) 88801.05 (-6.92 pct)
16-clients: 88961.66 (0.00 pct) 78807.71 (-11.41 pct)
32-clients: 82199.83 (0.00 pct) 73372.46 (-10.73 pct)
64-clients: 66094.87 (0.00 pct) 58487.61 (-11.50 pct)
128-clients: 43833.63 (0.00 pct) 42005.47 (-4.17 pct)
256-clients: 38917.58 (0.00 pct) 22653.73 (-41.79 pct)
o NPS2
Test: base affinity_scopes
1-clients: 101745.99 (0.00 pct) 102703.66 (0.94 pct)
2-clients: 100013.62 (0.00 pct) 99536.20 (-0.47 pct)
4-clients: 97124.42 (0.00 pct) 95261.28 (-1.91 pct)
8-clients: 92110.60 (0.00 pct) 87714.72 (-4.77 pct)
16-clients: 84578.86 (0.00 pct) 77329.65 (-8.57 pct)
32-clients: 78272.91 (0.00 pct) 72114.77 (-7.86 pct)
64-clients: 61595.20 (0.00 pct) 58001.87 (-5.83 pct)
128-clients: 44119.18 (0.00 pct) 40057.91 (-9.20 pct)
256-clients: 36221.03 (0.00 pct) 21468.40 (-40.72 pct)
o NPS4
Test: base affinity_scopes
1-clients: 102711.93 (0.00 pct) 103244.49 (0.51 pct)
2-clients: 101655.11 (0.00 pct) 98764.88 (-2.84 pct)
4-clients: 98519.58 (0.00 pct) 94439.88 (-4.14 pct)
8-clients: 94247.56 (0.00 pct) 88618.17 (-5.97 pct)
16-clients: 87515.03 (0.00 pct) 82392.50 (-5.85 pct)
32-clients: 81486.07 (0.00 pct) 74022.13 (-9.15 pct)
64-clients: 68436.64 (0.00 pct) 60303.48 (-11.88 pct)
128-clients: 49393.57 (0.00 pct) 43924.74 (-11.07 pct)
256-clients: 41111.27 (0.00 pct) 27694.64 (-32.63 pct)
~~~~~~~~~~~~~
~ unixbench ~
~~~~~~~~~~~~~
o NPS1
base affinity_scopes
Hmean unixbench-dhry2reg-1 41194259.44 ( 0.00%) 41044431.89 ( -0.36%)
Hmean unixbench-dhry2reg-512 6252840065.42 ( 0.00%) 6244309194.01 ( -0.14%)
Amean unixbench-syscall-1 2534936.20 ( 0.00%) 2517701.13 * 0.68%*
Amean unixbench-syscall-512 8037812.87 ( 0.00%) 7379945.23 * 8.18%*
Hmean unixbench-pipe-1 2391449.08 ( 0.00%) 2392275.16 ( 0.03%)
Hmean unixbench-pipe-512 340010431.31 ( 0.00%) 339389300.96 ( -0.18%)
Hmean unixbench-spawn-1 4471.68 ( 0.00%) 4568.80 ( 2.17%)
Hmean unixbench-spawn-512 66246.39 ( 0.00%) 62380.27 * -5.84%*
Hmean unixbench-execl-1 3695.11 ( 0.00%) 3663.75 * -0.85%*
Hmean unixbench-execl-512 12526.29 ( 0.00%) 11833.41 ( -5.53%)
o NPS2
base affinity_scopes
Hmean unixbench-dhry2reg-1 40812348.19 ( 0.00%) 41044955.13 ( 0.57%)
Hmean unixbench-dhry2reg-512 6248963826.97 ( 0.00%) 6244114150.91 ( -0.08%)
Amean unixbench-syscall-1 2479433.67 ( 0.00%) 2498544.70 ( -0.77%)
Amean unixbench-syscall-512 8064530.47 ( 0.00%) 8064139.93 ( 0.00%)
Hmean unixbench-pipe-1 2393194.62 ( 0.00%) 2365328.39 ( -1.16%)
Hmean unixbench-pipe-512 339553534.72 ( 0.00%) 340930432.76 ( 0.41%)
Hmean unixbench-spawn-1 4777.52 ( 0.00%) 4975.71 ( 4.15%)
Hmean unixbench-spawn-512 67467.26 ( 0.00%) 63427.50 * -5.99%*
Hmean unixbench-execl-1 3640.89 ( 0.00%) 3636.52 ( -0.12%)
Hmean unixbench-execl-512 14182.44 ( 0.00%) 13584.16 ( -4.22%)
o NPS4
base affinity_scopes
Hmean unixbench-dhry2reg-1 41075499.61 ( 0.00%) 41222189.50 ( 0.36%)
Hmean unixbench-dhry2reg-512 6250307266.90 ( 0.00%) 6251044709.08 ( 0.01%)
Amean unixbench-syscall-1 2538714.30 ( 0.00%) 2521520.87 * 0.68%*
Amean unixbench-syscall-512 7514126.30 ( 0.00%) 7534175.47 ( -0.27%)
Hmean unixbench-pipe-1 2393641.60 ( 0.00%) 2379400.79 ( -0.59%)
Hmean unixbench-pipe-512 339424173.78 ( 0.00%) 341229694.29 * 0.53%*
Hmean unixbench-spawn-1 5421.34 ( 0.00%) 5556.23 ( 2.49%)
Hmean unixbench-spawn-512 64071.52 ( 0.00%) 65783.47 * 2.67%*
Hmean unixbench-execl-1 3629.56 ( 0.00%) 3670.13 * 1.12%*
Hmean unixbench-execl-512 13641.24 ( 0.00%) 13848.81 ( 1.52%)
~~~~~~~~~~~~~~~~
~ ycsb-mongodb ~
~~~~~~~~~~~~~~~~
o NPS1:
base: 298681.00 (var: 2.31%)
affinity_scopes 295106.33 (var: 2.22%) (-1.19%)
o NPS2:
base: 296570.00 (var: 1.01%)
affinity_scopes 298637.67 (var: 1.50%) (0.70%)
o NPS4:
base 297181.67 (var: 0.46%)
affinity_scopes 294253.33 (var: 0.80%) (-0.99%)
~~~~~~~~~~~~~~~~~~
~ DeathStarBench ~
~~~~~~~~~~~~~~~~~~
o NPS1:
- 1 CCD
base: 1.00 (var: 0.14%)
affinity_scopes: 1.01 (var: 0.51%) (+1.19%)
- 2 CCD
base: 1.00 (var: 0.74%)
affinity_scopes: 0.99 (var: 0.47%) (-1.02%)
- 4 CCD
base: 1.00 (var: 0.33%)
affinity_scopes: 0.99 (var: 0.47%) (-0.95%)
- 8 CCD
base: 1.00 (var: 0.62%)
affinity_scopes: 0.99 (var: 2.30%) (-1.42%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Benchmarks run with multiple affinity scope ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
o NPS1
- tbench
Clients: base cpu cache numa system
1 450.40 (0.00 pct) 459.44 (2.00 pct) 457.12 (1.49 pct) 456.36 (1.32 pct) 456.75 (1.40 pct)
2 872.50 (0.00 pct) 869.68 (-0.32 pct) 890.59 (2.07 pct) 878.87 (0.73 pct) 890.14 (2.02 pct)
4 1630.13 (0.00 pct) 1621.24 (-0.54 pct) 1634.74 (0.28 pct) 1628.62 (-0.09 pct) 1646.57 (1.00 pct)
8 3139.90 (0.00 pct) 3044.58 (-3.03 pct) 3099.49 (-1.28 pct) 3081.43 (-1.86 pct) 3151.16 (0.35 pct)
16 6113.51 (0.00 pct) 5555.17 (-9.13 pct) 5465.09 (-10.60 pct) 5661.31 (-7.39 pct) 5742.58 (-6.06 pct)
32 11024.64 (0.00 pct) 9574.62 (-13.15 pct) 9282.62 (-15.80 pct) 9542.00 (-13.44 pct) 9916.66 (-10.05 pct)
64 19081.96 (0.00 pct) 15656.53 (-17.95 pct) 15176.12 (-20.46 pct) 16527.77 (-13.38 pct) 15097.97 (-20.87 pct)
128 30956.07 (0.00 pct) 28277.80 (-8.65 pct) 27662.76 (-10.63 pct) 27817.94 (-10.13 pct) 28925.78 (-6.55 pct)
256 42829.46 (0.00 pct) 38646.48 (-9.76 pct) 38355.27 (-10.44 pct) 37073.24 (-13.43 pct) 34391.01 (-19.70 pct)
512 42395.69 (0.00 pct) 36931.34 (-12.88 pct) 39259.49 (-7.39 pct) 36571.62 (-13.73 pct) 36245.55 (-14.50 pct)
1024 41973.51 (0.00 pct) 38817.07 (-7.52 pct) 38733.15 (-7.72 pct) 38864.45 (-7.40 pct) 35728.70 (-14.87 pct)
- netperf
base cpu cache numa system
1-clients: 100910.82 (0.00 pct) 103440.72 (2.50 pct) 102592.36 (1.66 pct) 103199.49 (2.26 pct) 103561.90 (2.62 pct)
2-clients: 99777.76 (0.00 pct) 100414.00 (0.63 pct) 100305.89 (0.52 pct) 99890.90 (0.11 pct) 101512.46 (1.73 pct)
4-clients: 97676.17 (0.00 pct) 96624.28 (-1.07 pct) 95966.77 (-1.75 pct) 97105.22 (-0.58 pct) 97972.11 (0.30 pct)
8-clients: 95413.11 (0.00 pct) 89926.72 (-5.75 pct) 89977.14 (-5.69 pct) 91020.10 (-4.60 pct) 92458.94 (-3.09 pct)
16-clients: 88961.66 (0.00 pct) 81295.02 (-8.61 pct) 79144.83 (-11.03 pct) 80216.42 (-9.83 pct) 85439.68 (-3.95 pct)
32-clients: 82199.83 (0.00 pct) 77914.00 (-5.21 pct) 75055.66 (-8.69 pct) 76813.94 (-6.55 pct) 80768.87 (-1.74 pct)
64-clients: 66094.87 (0.00 pct) 64419.91 (-2.53 pct) 63718.37 (-3.59 pct) 60370.40 (-8.66 pct) 66179.58 (0.12 pct)
128-clients: 43833.63 (0.00 pct) 42936.08 (-2.04 pct) 44554.69 (1.64 pct) 42666.82 (-2.66 pct) 45543.69 (3.90 pct)
256-clients: 38917.58 (0.00 pct) 24807.28 (-36.25 pct) 20517.01 (-47.28 pct) 21651.40 (-44.36 pct) 23778.87 (-38.89 pct)
- SPECjbb2015 Mutli-JVM
max-jOPS critical-jOPS
base: 0.00% 0.00%
smt: -1.11% -1.84%
cpu: 2.86% -1.35%
cache: 2.86% -1.66%
numa: 1.43% -1.49%
system: 0.08% -0.41%
I'll go dig deeper into the tbench and netperf regressions. I'm not sure
why the regression is observed for all the affinity scopes. I'll look
into IBS profile and see if something obvious pops up. Meanwhile if there
is any specific data you would like me to collect or benchmark you would
like me to test, let me know.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists