lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YREZ4G1xzncpdsVk@krava>
Date:   Mon, 9 Aug 2021 14:04:48 +0200
From:   Jiri Olsa <jolsa@...hat.com>
To:     Riccardo Mancini <rickyman7@...il.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        Alexey Bayduraev <alexey.v.bayduraev@...ux.intel.com>
Subject: Re: [RFC PATCH v2 10/10] perf synthetic-events: use workqueue
 parallel_for

On Fri, Jul 30, 2021 at 05:34:17PM +0200, Riccardo Mancini wrote:
> To generate synthetic events, perf has the option to use multiple
> threads. These threads are created manually using pthread_created.
> 
> This patch replaces the manual pthread_create with a workqueue,
> using the parallel_for utility.

hi,
I really like this new interface

> 
> Experimental results show that workqueue has a slightly higher overhead,
> but this is repayed by the improved work balancing among threads.

how did you measure that balancing improvement?
is there less kernel cycles spent?

I ran the benchmark and if I'm reading the results correctly I see
performance drop for high cpu numbers (full list attached below).


old perf:                                                                 new perf:

[jolsa@...l-r440-01 perf]$ ./perf.old bench internals synthesize -t       [jolsa@...l-r440-01 perf]$ ./perf bench internals synthesize -t
...
  Number of synthesis threads: 40                                           Number of synthesis threads: 40
    Average synthesis took: 2489.400 usec (+- 49.832 usec)                    Average synthesis took: 4576.500 usec (+- 75.278 usec)
    Average num. events: 956.800 (+- 6.721)                                   Average num. events: 1020.000 (+- 0.000)
    Average time per event 2.602 usec                                         Average time per event 4.487 usec

maybe profiling will show what's going on?

thanks,
jirka


---
[jolsa@...l-r440-01 perf]$ ./perf.old bench internals synthesize -t       [jolsa@...l-r440-01 perf]$ ./perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark:                               # Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by           Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:                                             synthesizing events on CPU 0:
  Number of synthesis threads: 1                                            Number of synthesis threads: 1 
    Average synthesis took: 7907.100 usec (+- 197.363 usec)                   Average synthesis took: 7972.900 usec (+- 198.158 usec)
    Average num. events: 956.000 (+- 0.000)                                   Average num. events: 936.000 (+- 0.000)
    Average time per event 8.271 usec                                         Average time per event 8.518 usec
  Number of synthesis threads: 2                                            Number of synthesis threads: 2 
    Average synthesis took: 5616.800 usec (+- 61.253 usec)                    Average synthesis took: 5844.700 usec (+- 87.219 usec)
    Average num. events: 958.800 (+- 0.327)                                   Average num. events: 940.000 (+- 0.000)
    Average time per event 5.858 usec                                         Average time per event 6.218 usec
  Number of synthesis threads: 3                                            Number of synthesis threads: 3 
    Average synthesis took: 4274.000 usec (+- 93.293 usec)                    Average synthesis took: 4019.700 usec (+- 67.354 usec)
    Average num. events: 962.000 (+- 0.000)                                   Average num. events: 942.000 (+- 0.000)
    Average time per event 4.443 usec                                         Average time per event 4.267 usec
  Number of synthesis threads: 4                                            Number of synthesis threads: 4 
    Average synthesis took: 3425.700 usec (+- 43.044 usec)                    Average synthesis took: 3382.200 usec (+- 74.652 usec)
    Average num. events: 959.600 (+- 0.933)                                   Average num. events: 944.000 (+- 0.000)
    Average time per event 3.570 usec                                         Average time per event 3.583 usec
  Number of synthesis threads: 5                                            Number of synthesis threads: 5 
    Average synthesis took: 2958.000 usec (+- 82.951 usec)                    Average synthesis took: 3086.500 usec (+- 48.213 usec)
    Average num. events: 966.000 (+- 0.000)                                   Average num. events: 946.000 (+- 0.000)
    Average time per event 3.062 usec                                         Average time per event 3.263 usec
  Number of synthesis threads: 6                                            Number of synthesis threads: 6 
    Average synthesis took: 2808.400 usec (+- 66.868 usec)                    Average synthesis took: 2752.200 usec (+- 56.411 usec)
    Average num. events: 956.800 (+- 0.327)                                   Average num. events: 948.000 (+- 0.000)
    Average time per event 2.935 usec                                         Average time per event 2.903 usec
  Number of synthesis threads: 7                                            Number of synthesis threads: 7 
    Average synthesis took: 2622.900 usec (+- 83.524 usec)                    Average synthesis took: 2548.200 usec (+- 48.042 usec)
    Average num. events: 958.400 (+- 0.267)                                   Average num. events: 950.000 (+- 0.000)
    Average time per event 2.737 usec                                         Average time per event 2.682 usec
  Number of synthesis threads: 8                                            Number of synthesis threads: 8 
    Average synthesis took: 2271.600 usec (+- 29.181 usec)                    Average synthesis took: 2486.600 usec (+- 47.862 usec)
    Average num. events: 972.000 (+- 0.000)                                   Average num. events: 952.000 (+- 0.000)
    Average time per event 2.337 usec                                         Average time per event 2.612 usec
  Number of synthesis threads: 9                                            Number of synthesis threads: 9 
    Average synthesis took: 2372.000 usec (+- 95.495 usec)                    Average synthesis took: 2347.300 usec (+- 23.959 usec)
    Average num. events: 959.200 (+- 0.952)                                   Average num. events: 954.000 (+- 0.000)
    Average time per event 2.473 usec                                         Average time per event 2.460 usec
  Number of synthesis threads: 10                                           Number of synthesis threads: 10
    Average synthesis took: 2544.600 usec (+- 107.569 usec)                   Average synthesis took: 2328.800 usec (+- 14.234 usec)
    Average num. events: 968.400 (+- 3.124)                                   Average num. events: 957.400 (+- 0.306)
    Average time per event 2.628 usec                                         Average time per event 2.432 usec
  Number of synthesis threads: 11                                           Number of synthesis threads: 11
    Average synthesis took: 2299.300 usec (+- 57.597 usec)                    Average synthesis took: 2340.300 usec (+- 34.638 usec)
    Average num. events: 956.000 (+- 0.000)                                   Average num. events: 960.000 (+- 0.000)
    Average time per event 2.405 usec                                         Average time per event 2.438 usec
  Number of synthesis threads: 12                                           Number of synthesis threads: 12
    Average synthesis took: 2545.500 usec (+- 69.557 usec)                    Average synthesis took: 2318.700 usec (+- 15.803 usec)
    Average num. events: 974.800 (+- 0.611)                                   Average num. events: 963.800 (+- 0.200)
    Average time per event 2.611 usec                                         Average time per event 2.406 usec
  Number of synthesis threads: 13                                           Number of synthesis threads: 13
    Average synthesis took: 2386.400 usec (+- 79.244 usec)                    Average synthesis took: 2408.700 usec (+- 27.071 usec)
    Average num. events: 950.500 (+- 5.726)                                   Average num. events: 966.000 (+- 0.000)
    Average time per event 2.511 usec                                         Average time per event 2.493 usec
  Number of synthesis threads: 14                                           Number of synthesis threads: 14 
    Average synthesis took: 2466.600 usec (+- 57.893 usec)                    Average synthesis took: 2547.200 usec (+- 53.445 usec)
    Average num. events: 957.600 (+- 0.718)                                   Average num. events: 968.000 (+- 0.000)
    Average time per event 2.576 usec                                         Average time per event 2.631 usec
  Number of synthesis threads: 15                                           Number of synthesis threads: 15 
    Average synthesis took: 2249.700 usec (+- 64.026 usec)                    Average synthesis took: 2647.900 usec (+- 79.014 usec)
    Average num. events: 956.000 (+- 0.000)                                   Average num. events: 970.000 (+- 0.000)
    Average time per event 2.353 usec                                         Average time per event 2.730 usec
  Number of synthesis threads: 16                                           Number of synthesis threads: 16 
    Average synthesis took: 2311.700 usec (+- 64.304 usec)                    Average synthesis took: 2676.200 usec (+- 34.824 usec)
    Average num. events: 955.000 (+- 0.907)                                   Average num. events: 972.000 (+- 0.000)
    Average time per event 2.421 usec                                         Average time per event 2.753 usec
  Number of synthesis threads: 17                                           Number of synthesis threads: 17 
    Average synthesis took: 2174.100 usec (+- 36.673 usec)                    Average synthesis took: 2580.100 usec (+- 45.414 usec)
    Average num. events: 971.600 (+- 3.124)                                   Average num. events: 974.000 (+- 0.000)
    Average time per event 2.238 usec                                         Average time per event 2.649 usec
  Number of synthesis threads: 18                                           Number of synthesis threads: 18 
    Average synthesis took: 2294.200 usec (+- 63.657 usec)                    Average synthesis took: 2810.200 usec (+- 49.113 usec)
    Average num. events: 953.200 (+- 0.611)                                   Average num. events: 976.000 (+- 0.000)
    Average time per event 2.407 usec                                         Average time per event 2.879 usec
  Number of synthesis threads: 19                                           Number of synthesis threads: 19 
    Average synthesis took: 2410.700 usec (+- 120.169 usec)                   Average synthesis took: 2862.400 usec (+- 36.982 usec)
    Average num. events: 953.400 (+- 0.306)                                   Average num. events: 978.000 (+- 0.000)
    Average time per event 2.529 usec                                         Average time per event 2.927 usec
  Number of synthesis threads: 20                                           Number of synthesis threads: 20 
    Average synthesis took: 2387.000 usec (+- 91.051 usec)                    Average synthesis took: 2908.800 usec (+- 36.404 usec)
    Average num. events: 952.800 (+- 0.800)                                   Average num. events: 978.600 (+- 0.306)
    Average time per event 2.505 usec                                         Average time per event 2.972 usec
  Number of synthesis threads: 21                                           Number of synthesis threads: 21 
    Average synthesis took: 2275.700 usec (+- 39.815 usec)                    Average synthesis took: 3141.100 usec (+- 30.896 usec)
    Average num. events: 954.600 (+- 0.306)                                   Average num. events: 980.000 (+- 0.000)
    Average time per event 2.384 usec                                         Average time per event 3.205 usec
  Number of synthesis threads: 22                                           Number of synthesis threads: 22 
    Average synthesis took: 2373.200 usec (+- 89.528 usec)                    Average synthesis took: 3342.400 usec (+- 112.115 usec)
    Average num. events: 949.100 (+- 5.843)                                   Average num. events: 982.000 (+- 0.000)
    Average time per event 2.500 usec                                         Average time per event 3.404 usec
  Number of synthesis threads: 23                                           Number of synthesis threads: 23 
    Average synthesis took: 2318.300 usec (+- 39.395 usec)                    Average synthesis took: 3269.700 usec (+- 55.215 usec)
    Average num. events: 954.600 (+- 0.427)                                   Average num. events: 984.000 (+- 0.000)
    Average time per event 2.429 usec                                         Average time per event 3.323 usec
  Number of synthesis threads: 24                                           Number of synthesis threads: 24
    Average synthesis took: 2241.900 usec (+- 52.577 usec)                    Average synthesis took: 3379.500 usec (+- 56.380 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 986.000 (+- 0.000)
    Average time per event 2.350 usec                                         Average time per event 3.427 usec
  Number of synthesis threads: 25                                           Number of synthesis threads: 25
    Average synthesis took: 2343.400 usec (+- 101.611 usec)                   Average synthesis took: 3382.500 usec (+- 51.535 usec)
    Average num. events: 956.200 (+- 1.009)                                   Average num. events: 988.000 (+- 0.000)
    Average time per event 2.451 usec                                         Average time per event 3.424 usec
  Number of synthesis threads: 26                                           Number of synthesis threads: 26
    Average synthesis took: 2260.700 usec (+- 18.863 usec)                    Average synthesis took: 3391.600 usec (+- 44.053 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 990.000 (+- 0.000)
    Average time per event 2.370 usec                                         Average time per event 3.426 usec
  Number of synthesis threads: 27                                           Number of synthesis threads: 27
    Average synthesis took: 2373.800 usec (+- 74.213 usec)                    Average synthesis took: 3659.200 usec (+- 113.176 usec)
    Average num. events: 955.000 (+- 0.803)                                   Average num. events: 992.000 (+- 0.000)
    Average time per event 2.486 usec                                         Average time per event 3.689 usec
  Number of synthesis threads: 28                                           Number of synthesis threads: 28
    Average synthesis took: 2335.500 usec (+- 49.480 usec)                    Average synthesis took: 3625.000 usec (+- 90.131 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 994.000 (+- 0.000)
    Average time per event 2.448 usec                                         Average time per event 3.647 usec
  Number of synthesis threads: 29                                           Number of synthesis threads: 29
    Average synthesis took: 2182.100 usec (+- 41.649 usec)                    Average synthesis took: 3708.400 usec (+- 103.717 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 996.000 (+- 0.000)
    Average time per event 2.287 usec                                         Average time per event 3.723 usec
  Number of synthesis threads: 30                                           Number of synthesis threads: 30
    Average synthesis took: 2246.100 usec (+- 58.252 usec)                    Average synthesis took: 3820.500 usec (+- 95.282 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 998.000 (+- 0.000)
    Average time per event 2.354 usec                                         Average time per event 3.828 usec
  Number of synthesis threads: 31                                           Number of synthesis threads: 31
    Average synthesis took: 2156.900 usec (+- 26.141 usec)                    Average synthesis took: 3881.400 usec (+- 36.277 usec)
    Average num. events: 948.300 (+- 5.700)                                   Average num. events: 1000.000 (+- 0.000)
    Average time per event 2.274 usec                                         Average time per event 3.881 usec
  Number of synthesis threads: 32                                           Number of synthesis threads: 32
    Average synthesis took: 2295.300 usec (+- 41.538 usec)                    Average synthesis took: 4191.700 usec (+- 149.780 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 1002.000 (+- 0.000)
    Average time per event 2.406 usec                                         Average time per event 4.183 usec
  Number of synthesis threads: 33                                           Number of synthesis threads: 33
    Average synthesis took: 2249.100 usec (+- 59.135 usec)                    Average synthesis took: 3988.200 usec (+- 25.015 usec)
    Average num. events: 948.500 (+- 5.726)                                   Average num. events: 1004.000 (+- 0.000)
    Average time per event 2.371 usec                                         Average time per event 3.972 usec
  Number of synthesis threads: 34                                           Number of synthesis threads: 34
    Average synthesis took: 2270.400 usec (+- 65.011 usec)                    Average synthesis took: 4064.600 usec (+- 44.158 usec)
    Average num. events: 954.200 (+- 0.200)                                   Average num. events: 1006.000 (+- 0.000)
    Average time per event 2.379 usec                                         Average time per event 4.040 usec
  Number of synthesis threads: 35                                           Number of synthesis threads: 35
    Average synthesis took: 2259.200 usec (+- 44.287 usec)                    Average synthesis took: 4145.700 usec (+- 37.297 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 1008.000 (+- 0.000)
    Average time per event 2.368 usec                                         Average time per event 4.113 usec
  Number of synthesis threads: 36                                           Number of synthesis threads: 36
    Average synthesis took: 2294.100 usec (+- 38.693 usec)                    Average synthesis took: 4234.900 usec (+- 81.904 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 1010.400 (+- 0.267)
    Average time per event 2.405 usec                                         Average time per event 4.191 usec
  Number of synthesis threads: 37                                           Number of synthesis threads: 37
    Average synthesis took: 2338.900 usec (+- 80.346 usec)                    Average synthesis took: 4337.900 usec (+- 30.071 usec)
    Average num. events: 954.400 (+- 0.267)                                   Average num. events: 1014.000 (+- 0.000)
    Average time per event 2.451 usec                                         Average time per event 4.278 usec
  Number of synthesis threads: 38                                           Number of synthesis threads: 38
    Average synthesis took: 2406.300 usec (+- 57.140 usec)                    Average synthesis took: 4426.600 usec (+- 27.035 usec)
    Average num. events: 938.400 (+- 7.730)                                   Average num. events: 1016.000 (+- 0.000)
    Average time per event 2.564 usec                                         Average time per event 4.357 usec
  Number of synthesis threads: 39                                           Number of synthesis threads: 39
    Average synthesis took: 2371.000 usec (+- 35.676 usec)                    Average synthesis took: 5979.000 usec (+- 1518.855 usec)
    Average num. events: 963.000 (+- 0.000)                                   Average num. events: 1018.000 (+- 0.000)
    Average time per event 2.462 usec                                         Average time per event 5.873 usec
  Number of synthesis threads: 40                                           Number of synthesis threads: 40
    Average synthesis took: 2489.400 usec (+- 49.832 usec)                    Average synthesis took: 4576.500 usec (+- 75.278 usec)
    Average num. events: 956.800 (+- 6.721)                                   Average num. events: 1020.000 (+- 0.000)
    Average time per event 2.602 usec                                         Average time per event 4.487 usec

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ