lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Nov 2017 20:05:47 +0800
From:   Jin Yao <yao.jin@...ux.intel.com>
To:     acme@...nel.org, jolsa@...nel.org, peterz@...radead.org,
        mingo@...hat.com, alexander.shishkin@...ux.intel.com
Cc:     Linux-kernel@...r.kernel.org, ak@...ux.intel.com,
        kan.liang@...el.com, yao.jin@...el.com,
        Jin Yao <yao.jin@...ux.intel.com>
Subject: [PATCH v3 00/11] perf stat: Enable '--per-thread' on all thread

v3:
---
Update according to Jiri's comments. The major modifications are:

1. Fix the crash issue when performing the git bisect.
   Move the removing of runtime_saved_values to the switching point
   (perf util: Remove a set of shadow stats static variables).

2. Still add struct perf_stat_config::*stats|stats_num in earlier
   patch because 'stats' will be used in
   'perf util: Update and print per-thread shadow stats'.
   If it's moved to 'perf stat: Allocate shadow stats buffer for threads',
   the compilation would be failed.

3. v = container_of(rb_node,
		    struct saved_value,
		    rb_node);

   Let this code to be one line.

Impacted patch:
---------------
  perf util: Add rbtree node_delete ops
  perf util: Create the runtime_stat init/free function
  perf util: Update and print per-thread shadow stats
  perf util: Remove a set of shadow stats static variables

v2:
---
Update according to Jiri's comments. The major modifications are:

1. Remove unnecessary memset for runtime_stat and
   walltime_nsecs_stats.

2. Remove init_saved_rblist/free_saved_rblist and move the codes
   to runtime_stat__init/runtime_stat__free.

3. Change function name from perf_stat__init_runtime_stat/
   perf_stat__free_runtime_stat to runtime_stat__init/
   runtime_stat__free

4. Create a new patch 'perf util: Extend rbtree to support shadow stats'
   to add new ctx/type/stat into rbtree keys for supporting per-thread
   shadow stats. It also comment in here on the cases where evsel is
   defined and when not.

5. Pass &rt_stat directly to perf_stat__update_shadow_stats, replace
   original parameter NULL.

6. Move the code of shadow stat updating for per-thread case to config->stats
   allocation/usage patch.

7. Add BUG_ON(!rb_node) in saved_value_delete.

8. s/stat_num/stats_num/ or s/stats/stat/ in struct perf_stat_config.

9. Rebase to latest perf/core branch.

Impacted patch:
---------------
  perf util: Define a structure for runtime shadow metrics stats
  perf util: Extend rbtree to support shadow stats
  perf util: Add rbtree node_delete ops
  perf util: Update and print per-thread shadow stats
  perf stat: Allocate shadow stats buffer for threads
  perf stat: Remove --per-thread pid/tid limitation
  perf stat: Resort '--per-thread' result

Initial post:
-------------

perf stat --per-thread is useful to break down data per thread.
But it currently requires specifying --pid/--tid to limit it to a process.

For analysis it would be useful to do it globally for the whole system.

1. Currently, if we perform 'perf stat --per-thread' without pid/tid,
perf returns error:

root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
    -p, --pid <pid>       stat events on existing process id
    -t, --tid <tid>       stat events on existing thread id

2. With this patch series, it returns data per thread with shadow metrics.
   (run "vmstat 1" in following example)

    root@skl:/tmp# perf stat --per-thread
    ^C
     Performance counter stats for 'system wide':

                perf-24165              4.302433      cpu-clock (msec)          #    0.001 CPUs utilized
              vmstat-23127              1.562215      cpu-clock (msec)          #    0.000 CPUs utilized
          irqbalance-2780               0.827851      cpu-clock (msec)          #    0.000 CPUs utilized
                sshd-23111              0.278308      cpu-clock (msec)          #    0.000 CPUs utilized
            thermald-2841               0.230880      cpu-clock (msec)          #    0.000 CPUs utilized
                sshd-23058              0.207306      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/0:2-19991              0.133983      cpu-clock (msec)          #    0.000 CPUs utilized
       kworker/u16:1-18249              0.125636      cpu-clock (msec)          #    0.000 CPUs utilized
           rcu_sched-8                  0.085533      cpu-clock (msec)          #    0.000 CPUs utilized
       kworker/u16:2-23146              0.077139      cpu-clock (msec)          #    0.000 CPUs utilized
               gmain-2700               0.041789      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/4:1-15354              0.028370      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/6:0-17528              0.023895      cpu-clock (msec)          #    0.000 CPUs utilized
        kworker/4:1H-1887               0.013209      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/5:2-31362              0.011627      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/0-11                 0.010892      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/3:2-12870              0.010220      cpu-clock (msec)          #    0.000 CPUs utilized
         ksoftirqd/0-7                  0.008869      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/1-14                 0.008476      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/7-50                 0.002944      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/3-26                 0.002893      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/4-32                 0.002759      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/2-20                 0.002429      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/6-44                 0.001491      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/5-38                 0.001477      cpu-clock (msec)          #    0.000 CPUs utilized
           rcu_sched-8                        10      context-switches          #    0.117 M/sec
       kworker/u16:1-18249                     7      context-switches          #    0.056 M/sec
                sshd-23111                     4      context-switches          #    0.014 M/sec
              vmstat-23127                     4      context-switches          #    0.003 M/sec
                perf-24165                     4      context-switches          #    0.930 K/sec
         kworker/0:2-19991                     3      context-switches          #    0.022 M/sec
       kworker/u16:2-23146                     3      context-switches          #    0.039 M/sec
         kworker/4:1-15354                     2      context-switches          #    0.070 M/sec
         kworker/6:0-17528                     2      context-switches          #    0.084 M/sec
                sshd-23058                     2      context-switches          #    0.010 M/sec
         ksoftirqd/0-7                         1      context-switches          #    0.113 M/sec
          watchdog/0-11                        1      context-switches          #    0.092 M/sec
          watchdog/1-14                        1      context-switches          #    0.118 M/sec
          watchdog/2-20                        1      context-switches          #    0.412 M/sec
          watchdog/3-26                        1      context-switches          #    0.346 M/sec
          watchdog/4-32                        1      context-switches          #    0.362 M/sec
          watchdog/5-38                        1      context-switches          #    0.677 M/sec
          watchdog/6-44                        1      context-switches          #    0.671 M/sec
          watchdog/7-50                        1      context-switches          #    0.340 M/sec
        kworker/4:1H-1887                      1      context-switches          #    0.076 M/sec
            thermald-2841                      1      context-switches          #    0.004 M/sec
               gmain-2700                      1      context-switches          #    0.024 M/sec
          irqbalance-2780                      1      context-switches          #    0.001 M/sec
         kworker/3:2-12870                     1      context-switches          #    0.098 M/sec
         kworker/5:2-31362                     1      context-switches          #    0.086 M/sec
       kworker/u16:1-18249                     2      cpu-migrations            #    0.016 M/sec
       kworker/u16:2-23146                     2      cpu-migrations            #    0.026 M/sec
           rcu_sched-8                         1      cpu-migrations            #    0.012 M/sec
                sshd-23058                     1      cpu-migrations            #    0.005 M/sec
                perf-24165             8,833,385      cycles                    #    2.053 GHz
              vmstat-23127             1,702,699      cycles                    #    1.090 GHz
          irqbalance-2780                739,847      cycles                    #    0.894 GHz
                sshd-23111               269,506      cycles                    #    0.968 GHz
            thermald-2841                204,556      cycles                    #    0.886 GHz
                sshd-23058               158,780      cycles                    #    0.766 GHz
         kworker/0:2-19991               112,981      cycles                    #    0.843 GHz
       kworker/u16:1-18249               100,926      cycles                    #    0.803 GHz
           rcu_sched-8                    74,024      cycles                    #    0.865 GHz
       kworker/u16:2-23146                55,984      cycles                    #    0.726 GHz
               gmain-2700                 34,278      cycles                    #    0.820 GHz
         kworker/4:1-15354                20,665      cycles                    #    0.728 GHz
         kworker/6:0-17528                16,445      cycles                    #    0.688 GHz
         kworker/5:2-31362                 9,492      cycles                    #    0.816 GHz
          watchdog/3-26                    8,695      cycles                    #    3.006 GHz
        kworker/4:1H-1887                  8,238      cycles                    #    0.624 GHz
          watchdog/4-32                    7,580      cycles                    #    2.747 GHz
         kworker/3:2-12870                 7,306      cycles                    #    0.715 GHz
          watchdog/2-20                    7,274      cycles                    #    2.995 GHz
          watchdog/0-11                    6,988      cycles                    #    0.642 GHz
         ksoftirqd/0-7                     6,376      cycles                    #    0.719 GHz
          watchdog/1-14                    5,340      cycles                    #    0.630 GHz
          watchdog/5-38                    4,061      cycles                    #    2.749 GHz
          watchdog/6-44                    3,976      cycles                    #    2.667 GHz
          watchdog/7-50                    3,418      cycles                    #    1.161 GHz
              vmstat-23127             2,511,699      instructions              #    1.48  insn per cycle
                perf-24165             1,829,908      instructions              #    0.21  insn per cycle
          irqbalance-2780              1,190,204      instructions              #    1.61  insn per cycle
            thermald-2841                143,544      instructions              #    0.70  insn per cycle
                sshd-23111               128,138      instructions              #    0.48  insn per cycle
                sshd-23058                57,654      instructions              #    0.36  insn per cycle
           rcu_sched-8                    44,063      instructions              #    0.60  insn per cycle
       kworker/u16:1-18249                42,551      instructions              #    0.42  insn per cycle
         kworker/0:2-19991                25,873      instructions              #    0.23  insn per cycle
       kworker/u16:2-23146                21,407      instructions              #    0.38  insn per cycle
               gmain-2700                 13,691      instructions              #    0.40  insn per cycle
         kworker/4:1-15354                12,964      instructions              #    0.63  insn per cycle
         kworker/6:0-17528                10,034      instructions              #    0.61  insn per cycle
         kworker/5:2-31362                 5,203      instructions              #    0.55  insn per cycle
         kworker/3:2-12870                 4,866      instructions              #    0.67  insn per cycle
        kworker/4:1H-1887                  3,586      instructions              #    0.44  insn per cycle
         ksoftirqd/0-7                     3,463      instructions              #    0.54  insn per cycle
          watchdog/0-11                    3,135      instructions              #    0.45  insn per cycle
          watchdog/1-14                    3,135      instructions              #    0.59  insn per cycle
          watchdog/2-20                    3,135      instructions              #    0.43  insn per cycle
          watchdog/3-26                    3,135      instructions              #    0.36  insn per cycle
          watchdog/4-32                    3,135      instructions              #    0.41  insn per cycle
          watchdog/5-38                    3,135      instructions              #    0.77  insn per cycle
          watchdog/6-44                    3,135      instructions              #    0.79  insn per cycle
          watchdog/7-50                    3,135      instructions              #    0.92  insn per cycle
              vmstat-23127               539,181      branches                  #  345.139 M/sec
                perf-24165               375,364      branches                  #   87.245 M/sec
          irqbalance-2780                262,092      branches                  #  316.593 M/sec
            thermald-2841                 31,611      branches                  #  136.915 M/sec
                sshd-23111                21,874      branches                  #   78.596 M/sec
                sshd-23058                10,682      branches                  #   51.528 M/sec
           rcu_sched-8                     8,693      branches                  #  101.633 M/sec
       kworker/u16:1-18249                 7,891      branches                  #   62.808 M/sec
         kworker/0:2-19991                 5,761      branches                  #   42.998 M/sec
       kworker/u16:2-23146                 4,099      branches                  #   53.138 M/sec
         kworker/4:1-15354                 2,755      branches                  #   97.110 M/sec
               gmain-2700                  2,638      branches                  #   63.127 M/sec
         kworker/6:0-17528                 2,216      branches                  #   92.739 M/sec
         kworker/5:2-31362                 1,132      branches                  #   97.360 M/sec
         kworker/3:2-12870                 1,081      branches                  #  105.773 M/sec
        kworker/4:1H-1887                    725      branches                  #   54.887 M/sec
         ksoftirqd/0-7                       707      branches                  #   79.716 M/sec
          watchdog/0-11                      652      branches                  #   59.860 M/sec
          watchdog/1-14                      652      branches                  #   76.923 M/sec
          watchdog/2-20                      652      branches                  #  268.423 M/sec
          watchdog/3-26                      652      branches                  #  225.372 M/sec
          watchdog/4-32                      652      branches                  #  236.318 M/sec
          watchdog/5-38                      652      branches                  #  441.435 M/sec
          watchdog/6-44                      652      branches                  #  437.290 M/sec
          watchdog/7-50                      652      branches                  #  221.467 M/sec
              vmstat-23127                 8,960      branch-misses             #    1.66% of all branches
          irqbalance-2780                  3,047      branch-misses             #    1.16% of all branches
                perf-24165                 2,876      branch-misses             #    0.77% of all branches
                sshd-23111                 1,843      branch-misses             #    8.43% of all branches
            thermald-2841                  1,444      branch-misses             #    4.57% of all branches
                sshd-23058                 1,379      branch-misses             #   12.91% of all branches
       kworker/u16:1-18249                   982      branch-misses             #   12.44% of all branches
           rcu_sched-8                       893      branch-misses             #   10.27% of all branches
       kworker/u16:2-23146                   578      branch-misses             #   14.10% of all branches
         kworker/0:2-19991                   376      branch-misses             #    6.53% of all branches
               gmain-2700                    280      branch-misses             #   10.61% of all branches
         kworker/6:0-17528                   196      branch-misses             #    8.84% of all branches
         kworker/4:1-15354                   187      branch-misses             #    6.79% of all branches
         kworker/5:2-31362                   123      branch-misses             #   10.87% of all branches
          watchdog/0-11                       95      branch-misses             #   14.57% of all branches
          watchdog/4-32                       89      branch-misses             #   13.65% of all branches
         kworker/3:2-12870                    80      branch-misses             #    7.40% of all branches
          watchdog/3-26                       61      branch-misses             #    9.36% of all branches
        kworker/4:1H-1887                     60      branch-misses             #    8.28% of all branches
          watchdog/2-20                       52      branch-misses             #    7.98% of all branches
         ksoftirqd/0-7                        47      branch-misses             #    6.65% of all branches
          watchdog/1-14                       46      branch-misses             #    7.06% of all branches
          watchdog/7-50                       13      branch-misses             #    1.99% of all branches
          watchdog/5-38                        8      branch-misses             #    1.23% of all branches
          watchdog/6-44                        7      branch-misses             #    1.07% of all branches

           3.695150786 seconds time elapsed

    root@skl:/tmp# perf stat --per-thread -M IPC,CPI
    ^C

     Performance counter stats for 'system wide':

              vmstat-23127             2,000,783      inst_retired.any          #      1.5 IPC
            thermald-2841              1,472,670      inst_retired.any          #      1.3 IPC
                sshd-23111               977,374      inst_retired.any          #      1.2 IPC
                perf-24163               483,779      inst_retired.any          #      0.2 IPC
               gmain-2700                341,213      inst_retired.any          #      0.9 IPC
                sshd-23058               148,891      inst_retired.any          #      0.8 IPC
        rtkit-daemon-3288                 71,210      inst_retired.any          #      0.7 IPC
       kworker/u16:1-18249                39,562      inst_retired.any          #      0.3 IPC
           rcu_sched-8                    14,474      inst_retired.any          #      0.8 IPC
         kworker/0:2-19991                 7,659      inst_retired.any          #      0.2 IPC
         kworker/4:1-15354                 6,714      inst_retired.any          #      0.8 IPC
        rtkit-daemon-3289                  4,839      inst_retired.any          #      0.3 IPC
         kworker/6:0-17528                 3,321      inst_retired.any          #      0.6 IPC
         kworker/5:2-31362                 3,215      inst_retired.any          #      0.5 IPC
         kworker/7:2-23145                 3,173      inst_retired.any          #      0.7 IPC
        kworker/4:1H-1887                  1,719      inst_retired.any          #      0.3 IPC
          watchdog/0-11                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/1-14                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/2-20                    1,479      inst_retired.any          #      0.4 IPC
          watchdog/3-26                    1,479      inst_retired.any          #      0.4 IPC
          watchdog/4-32                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/5-38                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/6-44                    1,479      inst_retired.any          #      0.7 IPC
          watchdog/7-50                    1,479      inst_retired.any          #      0.7 IPC
       kworker/u16:2-23146                 1,408      inst_retired.any          #      0.5 IPC
                perf-24163             2,249,872      cpu_clk_unhalted.thread
              vmstat-23127             1,352,455      cpu_clk_unhalted.thread
            thermald-2841              1,161,140      cpu_clk_unhalted.thread
                sshd-23111               807,827      cpu_clk_unhalted.thread
               gmain-2700                375,535      cpu_clk_unhalted.thread
                sshd-23058               194,071      cpu_clk_unhalted.thread
       kworker/u16:1-18249               114,306      cpu_clk_unhalted.thread
        rtkit-daemon-3288                103,547      cpu_clk_unhalted.thread
         kworker/0:2-19991                46,550      cpu_clk_unhalted.thread
           rcu_sched-8                    18,855      cpu_clk_unhalted.thread
        rtkit-daemon-3289                 17,549      cpu_clk_unhalted.thread
         kworker/4:1-15354                 8,812      cpu_clk_unhalted.thread
         kworker/5:2-31362                 6,812      cpu_clk_unhalted.thread
        kworker/4:1H-1887                  5,270      cpu_clk_unhalted.thread
         kworker/6:0-17528                 5,111      cpu_clk_unhalted.thread
         kworker/7:2-23145                 4,667      cpu_clk_unhalted.thread
          watchdog/0-11                    4,663      cpu_clk_unhalted.thread
          watchdog/1-14                    4,663      cpu_clk_unhalted.thread
          watchdog/4-32                    4,626      cpu_clk_unhalted.thread
          watchdog/5-38                    4,403      cpu_clk_unhalted.thread
          watchdog/3-26                    3,936      cpu_clk_unhalted.thread
          watchdog/2-20                    3,850      cpu_clk_unhalted.thread
       kworker/u16:2-23146                 2,654      cpu_clk_unhalted.thread
          watchdog/6-44                    2,017      cpu_clk_unhalted.thread
          watchdog/7-50                    2,017      cpu_clk_unhalted.thread
              vmstat-23127             2,000,783      inst_retired.any          #      0.7 CPI
            thermald-2841              1,472,670      inst_retired.any          #      0.8 CPI
                sshd-23111               977,374      inst_retired.any          #      0.8 CPI
                perf-24163               495,037      inst_retired.any          #      4.7 CPI
               gmain-2700                341,213      inst_retired.any          #      1.1 CPI
                sshd-23058               148,891      inst_retired.any          #      1.3 CPI
        rtkit-daemon-3288                 71,210      inst_retired.any          #      1.5 CPI
       kworker/u16:1-18249                39,562      inst_retired.any          #      2.9 CPI
           rcu_sched-8                    14,474      inst_retired.any          #      1.3 CPI
         kworker/0:2-19991                 7,659      inst_retired.any          #      6.1 CPI
         kworker/4:1-15354                 6,714      inst_retired.any          #      1.3 CPI
        rtkit-daemon-3289                  4,839      inst_retired.any          #      3.6 CPI
         kworker/6:0-17528                 3,321      inst_retired.any          #      1.5 CPI
         kworker/5:2-31362                 3,215      inst_retired.any          #      2.1 CPI
         kworker/7:2-23145                 3,173      inst_retired.any          #      1.5 CPI
        kworker/4:1H-1887                  1,719      inst_retired.any          #      3.1 CPI
          watchdog/0-11                    1,479      inst_retired.any          #      3.2 CPI
          watchdog/1-14                    1,479      inst_retired.any          #      3.2 CPI
          watchdog/2-20                    1,479      inst_retired.any          #      2.6 CPI
          watchdog/3-26                    1,479      inst_retired.any          #      2.7 CPI
          watchdog/4-32                    1,479      inst_retired.any          #      3.1 CPI
          watchdog/5-38                    1,479      inst_retired.any          #      3.0 CPI
          watchdog/6-44                    1,479      inst_retired.any          #      1.4 CPI
          watchdog/7-50                    1,479      inst_retired.any          #      1.4 CPI
       kworker/u16:2-23146                 1,408      inst_retired.any          #      1.9 CPI
                perf-24163             2,302,323      cycles
              vmstat-23127             1,352,455      cycles
            thermald-2841              1,161,140      cycles
                sshd-23111               807,827      cycles
               gmain-2700                375,535      cycles
                sshd-23058               194,071      cycles
       kworker/u16:1-18249               114,306      cycles
        rtkit-daemon-3288                103,547      cycles
         kworker/0:2-19991                46,550      cycles
           rcu_sched-8                    18,855      cycles
        rtkit-daemon-3289                 17,549      cycles
         kworker/4:1-15354                 8,812      cycles
         kworker/5:2-31362                 6,812      cycles
        kworker/4:1H-1887                  5,270      cycles
         kworker/6:0-17528                 5,111      cycles
         kworker/7:2-23145                 4,667      cycles
          watchdog/0-11                    4,663      cycles
          watchdog/1-14                    4,663      cycles
          watchdog/4-32                    4,626      cycles
          watchdog/5-38                    4,403      cycles
          watchdog/3-26                    3,936      cycles
          watchdog/2-20                    3,850      cycles
       kworker/u16:2-23146                 2,654      cycles
          watchdog/6-44                    2,017      cycles
          watchdog/7-50                    2,017      cycles

           2.175726600 seconds time elapsed

Jin Yao (11):
  perf util: Create rblist__reset() function
  perf util: Define a structure for runtime shadow stats
  perf util: Extend rbtree to support shadow stats
  perf util: Add rbtree node_delete ops
  perf util: Create the runtime_stat init/free function
  perf util: Update and print per-thread shadow stats
  perf util: Remove a set of shadow stats static variables
  perf stat: Allocate shadow stats buffer for threads
  perf util: Reuse thread_map__new_by_uid to enumerate threads from
    /proc
  perf stat: Remove --per-thread pid/tid limitation
  perf stat: Resort '--per-thread' result

 tools/perf/builtin-script.c   |   6 +-
 tools/perf/builtin-stat.c     | 170 ++++++++++++++---
 tools/perf/tests/thread-map.c |   2 +-
 tools/perf/util/evlist.c      |   3 +-
 tools/perf/util/rblist.c      |  24 ++-
 tools/perf/util/rblist.h      |   1 +
 tools/perf/util/stat-shadow.c | 426 +++++++++++++++++++++++++-----------------
 tools/perf/util/stat.c        |  15 +-
 tools/perf/util/stat.h        |  64 ++++++-
 tools/perf/util/target.h      |   7 +
 tools/perf/util/thread_map.c  |  19 +-
 tools/perf/util/thread_map.h  |   3 +-
 12 files changed, 526 insertions(+), 214 deletions(-)

-- 
2.7.4

Powered by blists - more mailing lists