[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YSwDTWsihFxn6f1E@krava>
Date: Sun, 29 Aug 2021 23:59:41 +0200
From: Jiri Olsa <jolsa@...hat.com>
To: Riccardo Mancini <rickyman7@...il.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Alexey Bayduraev <alexey.v.bayduraev@...ux.intel.com>
Subject: Re: [RFC PATCH v3 00/15] perf: add workqueue library and use it in
synthetic-events
On Fri, Aug 20, 2021 at 12:53:46PM +0200, Riccardo Mancini wrote:
> Changes in v3:
> - improved separation of threadpool and threadpool_entry method
> - replaced shared workqueue with per-thread workqueue. This should
> improve the performance on big machines (Jiri noticed in his
> experiments a significant performance degradation after 15 threads
> with the shared queue).
> - improved error reporting in both threadpool and workqueue
> - added lazy spinup of threads in workqueue [9/15]
> - added global workqueue [10/15]
> - setup global workqueue in perf record, top and synthesize bench
> [12-14/15] and used in in synthetic events
hi,
I ran the test again and there's still the slowdown,
adding the stats below
I'm doing the review and I noticed few strange things,
but so far nothing that would explain that
like I can see for 40 threads only 35 threads spawned,
need to check on that more
also I'll try run some tests for parallel_for > 1 to cut
down some of the workqueue code.. any tests on that?
jirka
---
new: old:
ell-r440-01 perf]# ./perf bench internals synthesize -t [root@...l-r440-01 perf]# ./perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark: # Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0: synthesizing events on CPU 0:
Number of synthesis threads: 1 Number of synthesis threads: 1
Average synthesis took: 13970.400 usec (+- 339.216 usec) Average synthesis took: 13563.700 usec (+- 348.354 usec)
Average num. events: 2349.000 (+- 0.000) Average num. events: 2317.000 (+- 0.000)
Average time per event 5.947 usec Average time per event 5.854 usec
Number of synthesis threads: 2 Number of synthesis threads: 2
Average synthesis took: 15651.800 usec (+- 1612.798 usec) Average synthesis took: 8433.600 usec (+- 83.725 usec)
Average num. events: 2353.000 (+- 0.000) Average num. events: 2321.600 (+- 0.306)
Average time per event 6.652 usec Average time per event 3.633 usec
Number of synthesis threads: 3 Number of synthesis threads: 3
Average synthesis took: 12114.100 usec (+- 1208.208 usec) Average synthesis took: 6716.200 usec (+- 16.889 usec)
Average num. events: 2355.000 (+- 0.000) Average num. events: 2325.000 (+- 0.000)
Average time per event 5.144 usec Average time per event 2.889 usec
Number of synthesis threads: 4 Number of synthesis threads: 4
Average synthesis took: 9812.500 usec (+- 951.284 usec) Average synthesis took: 5981.400 usec (+- 11.102 usec)
Average num. events: 2357.000 (+- 0.000) Average num. events: 2323.000 (+- 0.000)
Average time per event 4.163 usec Average time per event 2.575 usec
Number of synthesis threads: 5 Number of synthesis threads: 5
Average synthesis took: 7338.300 usec (+- 661.620 usec) Average synthesis took: 5538.800 usec (+- 12.990 usec)
Average num. events: 2359.000 (+- 0.000) Average num. events: 2329.000 (+- 0.000)
Average time per event 3.111 usec Average time per event 2.378 usec
Number of synthesis threads: 6 Number of synthesis threads: 6
Average synthesis took: 7256.800 usec (+- 680.312 usec) Average synthesis took: 5255.700 usec (+- 7.454 usec)
Average num. events: 2361.000 (+- 0.000) Average num. events: 2331.000 (+- 0.000)
Average time per event 3.074 usec Average time per event 2.255 usec
Number of synthesis threads: 7 Number of synthesis threads: 7
Average synthesis took: 6119.600 usec (+- 479.409 usec) Average synthesis took: 4836.200 usec (+- 8.132 usec)
Average num. events: 2363.000 (+- 0.000) Average num. events: 2323.000 (+- 0.000)
Average time per event 2.590 usec Average time per event 2.082 usec
Number of synthesis threads: 8 Number of synthesis threads: 8
Average synthesis took: 5899.600 usec (+- 506.285 usec) Average synthesis took: 4643.000 usec (+- 4.913 usec)
Average num. events: 2365.000 (+- 0.000) Average num. events: 2335.000 (+- 0.000)
Average time per event 2.495 usec Average time per event 1.988 usec
Number of synthesis threads: 9 Number of synthesis threads: 9
Average synthesis took: 5459.100 usec (+- 431.725 usec) Average synthesis took: 4526.600 usec (+- 5.207 usec)
Average num. events: 2367.000 (+- 0.000) Average num. events: 2337.000 (+- 0.000)
Average time per event 2.306 usec Average time per event 1.937 usec
Number of synthesis threads: 10 Number of synthesis threads: 10
Average synthesis took: 4977.100 usec (+- 251.378 usec) Average synthesis took: 4128.700 usec (+- 5.911 usec)
Average num. events: 2369.000 (+- 0.000) Average num. events: 2327.800 (+- 0.533)
Average time per event 2.101 usec Average time per event 1.774 usec
Number of synthesis threads: 11 Number of synthesis threads: 11
Average synthesis took: 5428.700 usec (+- 513.409 usec) Average synthesis took: 3890.800 usec (+- 15.051 usec)
Average num. events: 2371.000 (+- 0.000) Average num. events: 2323.000 (+- 0.000)
Average time per event 2.290 usec Average time per event 1.675 usec
Number of synthesis threads: 12 Number of synthesis threads: 12
Average synthesis took: 5517.800 usec (+- 508.171 usec) Average synthesis took: 3367.800 usec (+- 14.261 usec)
Average num. events: 2373.000 (+- 0.000) Average num. events: 2343.000 (+- 0.000)
Average time per event 2.325 usec Average time per event 1.437 usec
Number of synthesis threads: 13 Number of synthesis threads: 13
Average synthesis took: 5279.500 usec (+- 432.819 usec) Average synthesis took: 3974.300 usec (+- 12.437 usec)
Average num. events: 2375.000 (+- 0.000) Average num. events: 2328.200 (+- 1.405)
Average time per event 2.223 usec Average time per event 1.707 usec
Number of synthesis threads: 14 Number of synthesis threads: 14
Average synthesis took: 4993.100 usec (+- 392.485 usec) Average synthesis took: 4157.100 usec (+- 163.268 usec)
Average num. events: 2377.000 (+- 0.000) Average num. events: 2319.800 (+- 0.533)
Average time per event 2.101 usec Average time per event 1.792 usec
Number of synthesis threads: 15 Number of synthesis threads: 15
Average synthesis took: 5584.700 usec (+- 379.862 usec) Average synthesis took: 4065.700 usec (+- 25.656 usec)
Average num. events: 2379.000 (+- 0.000) Average num. events: 2322.800 (+- 0.467)
Average time per event 2.347 usec Average time per event 1.750 usec
Number of synthesis threads: 16 Number of synthesis threads: 16
Average synthesis took: 5009.800 usec (+- 381.018 usec) Average synthesis took: 4580.600 usec (+- 129.218 usec)
Average num. events: 2381.000 (+- 0.000) Average num. events: 2324.800 (+- 0.200)
Average time per event 2.104 usec Average time per event 1.970 usec
Number of synthesis threads: 17 Number of synthesis threads: 17
Average synthesis took: 5543.300 usec (+- 376.064 usec) Average synthesis took: 4089.700 usec (+- 54.096 usec)
Average num. events: 2383.000 (+- 0.000) Average num. events: 2320.200 (+- 0.611)
Average time per event 2.326 usec Average time per event 1.763 usec
Number of synthesis threads: 18 Number of synthesis threads: 18
Average synthesis took: 5191.800 usec (+- 342.317 usec) Average synthesis took: 4219.000 usec (+- 61.395 usec)
Average num. events: 2385.000 (+- 0.000) Average num. events: 2323.000 (+- 0.516)
Average time per event 2.177 usec Average time per event 1.816 usec
Number of synthesis threads: 19 Number of synthesis threads: 19
Average synthesis took: 4647.000 usec (+- 273.303 usec) Average synthesis took: 3998.800 usec (+- 49.221 usec)
Average num. events: 2387.000 (+- 0.000) Average num. events: 2325.200 (+- 0.200)
Average time per event 1.947 usec Average time per event 1.720 usec
Number of synthesis threads: 20 Number of synthesis threads: 20
Average synthesis took: 4710.600 usec (+- 179.874 usec) Average synthesis took: 3930.300 usec (+- 67.725 usec)
Average num. events: 2389.000 (+- 0.000) Average num. events: 2319.000 (+- 0.000)
Average time per event 1.972 usec Average time per event 1.695 usec
Number of synthesis threads: 21 Number of synthesis threads: 21
Average synthesis took: 4959.100 usec (+- 318.519 usec) Average synthesis took: 3696.400 usec (+- 30.953 usec)
Average num. events: 2390.800 (+- 0.200) Average num. events: 2319.800 (+- 0.533)
Average time per event 2.074 usec Average time per event 1.593 usec
Number of synthesis threads: 22 Number of synthesis threads: 22
Average synthesis took: 4422.300 usec (+- 236.998 usec) Average synthesis took: 3394.000 usec (+- 63.254 usec)
Average num. events: 2392.800 (+- 0.200) Average num. events: 2319.000 (+- 0.000)
Average time per event 1.848 usec Average time per event 1.464 usec
Number of synthesis threads: 23 Number of synthesis threads: 23
Average synthesis took: 4640.800 usec (+- 245.604 usec) Average synthesis took: 4091.100 usec (+- 134.320 usec)
Average num. events: 2394.400 (+- 0.600) Average num. events: 2323.400 (+- 0.267)
Average time per event 1.938 usec Average time per event 1.761 usec
Number of synthesis threads: 24 Number of synthesis threads: 24
Average synthesis took: 4554.900 usec (+- 201.121 usec) Average synthesis took: 3346.600 usec (+- 78.846 usec)
Average num. events: 2395.800 (+- 0.854) Average num. events: 2321.000 (+- 0.667)
Average time per event 1.901 usec Average time per event 1.442 usec
Number of synthesis threads: 25 Number of synthesis threads: 25
Average synthesis took: 4668.300 usec (+- 248.254 usec) Average synthesis took: 3794.300 usec (+- 191.158 usec)
Average num. events: 2398.000 (+- 0.803) Average num. events: 2317.900 (+- 6.248)
Average time per event 1.947 usec Average time per event 1.637 usec
Number of synthesis threads: 26 Number of synthesis threads: 26
Average synthesis took: 4683.300 usec (+- 226.836 usec) Average synthesis took: 3285.700 usec (+- 18.785 usec)
Average num. events: 2399.000 (+- 1.265) Average num. events: 2317.100 (+- 6.198)
Average time per event 1.952 usec Average time per event 1.418 usec
Number of synthesis threads: 27 Number of synthesis threads: 27
Average synthesis took: 4590.300 usec (+- 158.000 usec) Average synthesis took: 3604.600 usec (+- 35.487 usec)
Average num. events: 2400.200 (+- 1.497) Average num. events: 2319.800 (+- 0.533)
Average time per event 1.912 usec Average time per event 1.554 usec
Number of synthesis threads: 28 Number of synthesis threads: 28
Average synthesis took: 4683.500 usec (+- 233.543 usec) Average synthesis took: 3594.700 usec (+- 21.267 usec)
Average num. events: 2402.400 (+- 1.688) Average num. events: 2319.200 (+- 0.200)
Average time per event 1.950 usec Average time per event 1.550 usec
Number of synthesis threads: 29 Number of synthesis threads: 29
Average synthesis took: 4830.700 usec (+- 235.730 usec) Average synthesis took: 3531.700 usec (+- 15.935 usec)
Average num. events: 2405.000 (+- 2.530) Average num. events: 2322.200 (+- 0.800)
Average time per event 2.009 usec Average time per event 1.521 usec
Number of synthesis threads: 30 Number of synthesis threads: 30
Average synthesis took: 4684.500 usec (+- 210.137 usec) Average synthesis took: 3505.700 usec (+- 58.332 usec)
Average num. events: 2407.600 (+- 2.495) Average num. events: 2315.100 (+- 5.900)
Average time per event 1.946 usec Average time per event 1.514 usec
Number of synthesis threads: 31 Number of synthesis threads: 31
Average synthesis took: 4823.300 usec (+- 213.480 usec) Average synthesis took: 3431.100 usec (+- 42.022 usec)
Average num. events: 2407.400 (+- 2.647) Average num. events: 2319.000 (+- 0.000)
Average time per event 2.004 usec Average time per event 1.480 usec
Number of synthesis threads: 32 Number of synthesis threads: 32
Average synthesis took: 4400.800 usec (+- 224.134 usec) Average synthesis took: 3684.900 usec (+- 253.077 usec)
Average num. events: 2407.400 (+- 2.544) Average num. events: 2319.200 (+- 0.200)
Average time per event 1.828 usec Average time per event 1.589 usec
Number of synthesis threads: 33 Number of synthesis threads: 33
Average synthesis took: 4452.600 usec (+- 231.034 usec) Average synthesis took: 3233.000 usec (+- 24.035 usec)
Average num. events: 2409.300 (+- 3.190) Average num. events: 2316.500 (+- 6.069)
Average time per event 1.848 usec Average time per event 1.396 usec
Number of synthesis threads: 34 Number of synthesis threads: 34
Average synthesis took: 4770.900 usec (+- 182.325 usec) Average synthesis took: 3016.300 usec (+- 13.343 usec)
Average num. events: 2411.200 (+- 3.032) Average num. events: 2322.800 (+- 0.200)
Average time per event 1.979 usec Average time per event 1.299 usec
Number of synthesis threads: 35 Number of synthesis threads: 35
Average synthesis took: 4442.800 usec (+- 248.017 usec) Average synthesis took: 3246.700 usec (+- 71.765 usec)
Average num. events: 2412.000 (+- 3.296) Average num. events: 2321.800 (+- 0.611)
Average time per event 1.842 usec Average time per event 1.398 usec
Number of synthesis threads: 36 Number of synthesis threads: 36
Average synthesis took: 5005.200 usec (+- 235.823 usec) Average synthesis took: 3329.000 usec (+- 122.028 usec)
Average num. events: 2410.400 (+- 2.750) Average num. events: 2310.800 (+- 8.133)
Average time per event 2.077 usec Average time per event 1.441 usec
Number of synthesis threads: 37 Number of synthesis threads: 37
Average synthesis took: 4654.000 usec (+- 208.838 usec) Average synthesis took: 3011.600 usec (+- 46.026 usec)
Average num. events: 2409.400 (+- 2.473) Average num. events: 2322.200 (+- 0.533)
Average time per event 1.932 usec Average time per event 1.297 usec
Number of synthesis threads: 38 Number of synthesis threads: 38
Average synthesis took: 4763.700 usec (+- 197.409 usec) Average synthesis took: 3163.500 usec (+- 36.589 usec)
Average num. events: 2406.200 (+- 2.462) Average num. events: 2319.000 (+- 0.000)
Average time per event 1.980 usec Average time per event 1.364 usec
Number of synthesis threads: 39 Number of synthesis threads: 39
Average synthesis took: 4333.100 usec (+- 194.456 usec) Average synthesis took: 3170.900 usec (+- 30.538 usec)
Average num. events: 2408.600 (+- 3.124) Average num. events: 2319.000 (+- 0.000)
Average time per event 1.799 usec Average time per event 1.367 usec
Number of synthesis threads: 40 Number of synthesis threads: 40
Average synthesis took: 4520.200 usec (+- 188.901 usec) Average synthesis took: 3111.900 usec (+- 24.287 usec)
Average num. events: 2409.600 (+- 3.184) Average num. events: 2307.600 (+- 7.600)
Average time per event 1.876 usec Average time per event 1.349 usec
Powered by blists - more mailing lists