[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81c57f23-bc1c-df1b-d346-74c0311ba8dc@huawei.com>
Date: Wed, 27 Jul 2022 08:36:22 +0800
From: Yang Jihong <yangjihong1@...wei.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
CC: <peterz@...radead.org>, <mingo@...hat.com>, <mark.rutland@....com>,
<alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
<namhyung@...nel.org>, <pc@...ibm.com>,
<linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>,
Yonghong Song <yhs@...com>,
"Andrii Nakryiko" <andrii.nakryiko@...il.com>,
Song Liu <songliubraving@...com>
Subject: Re: [RFC v3 00/17] perf: Add perf kwork (using BPF skels)
Hello,
On 2022/7/26 5:45, Arnaldo Carvalho de Melo wrote:
> Em Sat, Jul 09, 2022 at 09:50:16AM +0800, Yang Jihong escreveu:
>> Sometimes, we need to analyze time properties of kernel work such as irq,
>> softirq, and workqueue, including delay and running time of specific interrupts.
>> Currently, these events have kernel tracepoints, but perf tool does not
>> directly analyze the delay of these events
>>
>> The perf-kwork tool is used to trace time properties of kernel work
>> (such as irq, softirq, and workqueue), including runtime, latency,
>> and timehist, using the infrastructure in the perf tools to allow
>> tracing extra targets
>>
>> We also use bpf trace to collect and filter data in kernel to solve
>> problem of large perf data volume and extra file system interruptions.
>
> Pushed out to tmp.perf/core, will continue reviewing and testing then
> move to perf/core, thanks for the great work.
>
> Its fantastic how the bpf skel infra is working well with tools/perf,
> really great.
>
Thanks for your affirmation. :)
Regards,
Yang Jihong
> - Arnaldo
>
>> Example usage:
>>
>> 1. Kwork record:
>>
>> # perf kwork record -- sleep 10
>> [ perf record: Woken up 0 times to write data ]
>> [ perf record: Captured and wrote 6.825 MB perf.data ]
>>
>> 2. Kwork report:
>>
>> # perf kwork report -S
>>
>> Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
>> --------------------------------------------------------------------------------------------------------------------------------
>> virtio0-requests:25 | 0000 | 1347.861 ms | 25049 | 1.417 ms | 121235.524083 s | 121235.525499 s |
>> (s)TIMER:1 | 0005 | 151.033 ms | 2545 | 0.153 ms | 121237.454591 s | 121237.454744 s |
>> (s)RCU:9 | 0005 | 117.254 ms | 2754 | 0.223 ms | 121239.461024 s | 121239.461246 s |
>> (s)SCHED:7 | 0005 | 58.714 ms | 1773 | 0.075 ms | 121237.702345 s | 121237.702419 s |
>> (s)RCU:9 | 0007 | 43.359 ms | 945 | 0.861 ms | 121237.702984 s | 121237.703845 s |
>> (s)SCHED:7 | 0000 | 33.389 ms | 549 | 4.121 ms | 121235.521379 s | 121235.525499 s |
>> (s)RCU:9 | 0002 | 21.419 ms | 484 | 0.281 ms | 121244.629001 s | 121244.629282 s |
>> (w)mix_interrupt_randomness | 0000 | 21.047 ms | 391 | 1.016 ms | 121237.934008 s | 121237.935024 s |
>> (s)SCHED:7 | 0007 | 19.903 ms | 570 | 0.065 ms | 121235.523360 s | 121235.523426 s |
>> (s)RCU:9 | 0000 | 19.017 ms | 472 | 0.507 ms | 121244.634002 s | 121244.634510 s |
>> ... <SNIP> ...
>> (s)SCHED:7 | 0003 | 0.049 ms | 1 | 0.049 ms | 121240.018631 s | 121240.018680 s |
>> (w)vmstat_update | 0003 | 0.046 ms | 1 | 0.046 ms | 121240.916200 s | 121240.916246 s |
>> (s)RCU:9 | 0004 | 0.045 ms | 2 | 0.024 ms | 121235.522876 s | 121235.522900 s |
>> (w)neigh_managed_work | 0001 | 0.044 ms | 1 | 0.044 ms | 121235.513929 s | 121235.513973 s |
>> (w)vmstat_update | 0006 | 0.031 ms | 1 | 0.031 ms | 121245.673914 s | 121245.673945 s |
>> (w)vmstat_update | 0004 | 0.028 ms | 1 | 0.028 ms | 121235.522743 s | 121235.522770 s |
>> (w)wb_update_bandwidth_workfn | 0000 | 0.024 ms | 1 | 0.024 ms | 121244.842660 s | 121244.842683 s |
>> --------------------------------------------------------------------------------------------------------------------------------
>> Total count : 36071
>> Total runtime (msec) : 1887.188 (0.185% load average)
>> Total time span (msec) : 10185.012
>> --------------------------------------------------------------------------------------------------------------------------------
>>
>> 3. Kwork latency:
>>
>> # perf kwork latency
>>
>> Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
>> --------------------------------------------------------------------------------------------------------------------------------
>> (s)TIMER:1 | 0004 | 3.903 ms | 1 | 3.903 ms | 121235.517068 s | 121235.520971 s |
>> (s)RCU:9 | 0004 | 3.252 ms | 2 | 5.809 ms | 121235.517068 s | 121235.522876 s |
>> (s)RCU:9 | 0001 | 3.238 ms | 2 | 5.832 ms | 121235.514494 s | 121235.520326 s |
>> (w)vmstat_update | 0004 | 1.738 ms | 1 | 1.738 ms | 121235.521005 s | 121235.522743 s |
>> (s)SCHED:7 | 0004 | 0.978 ms | 2 | 1.899 ms | 121235.520940 s | 121235.522840 s |
>> (w)wb_update_bandwidth_workfn | 0000 | 0.834 ms | 1 | 0.834 ms | 121244.841826 s | 121244.842660 s |
>> (s)RCU:9 | 0003 | 0.479 ms | 3 | 0.752 ms | 121240.027521 s | 121240.028273 s |
>> (s)TIMER:1 | 0001 | 0.465 ms | 1 | 0.465 ms | 121235.513107 s | 121235.513572 s |
>> (w)vmstat_update | 0000 | 0.391 ms | 5 | 1.275 ms | 121236.814938 s | 121236.816213 s |
>> (w)mix_interrupt_randomness | 0002 | 0.317 ms | 5 | 0.874 ms | 121244.628034 s | 121244.628908 s |
>> (w)neigh_managed_work | 0001 | 0.315 ms | 1 | 0.315 ms | 121235.513614 s | 121235.513929 s |
>> ... <SNIP> ...
>> (s)TIMER:1 | 0005 | 0.061 ms | 2545 | 0.506 ms | 121237.136113 s | 121237.136619 s |
>> (s)SCHED:7 | 0001 | 0.052 ms | 21 | 0.437 ms | 121237.711014 s | 121237.711451 s |
>> (s)SCHED:7 | 0002 | 0.045 ms | 309 | 0.145 ms | 121237.137184 s | 121237.137329 s |
>> (s)SCHED:7 | 0003 | 0.045 ms | 1 | 0.045 ms | 121240.018586 s | 121240.018631 s |
>> (s)SCHED:7 | 0007 | 0.044 ms | 570 | 0.173 ms | 121238.161161 s | 121238.161334 s |
>> (s)BLOCK:4 | 0003 | 0.030 ms | 4 | 0.056 ms | 121240.028255 s | 121240.028311 s |
>> --------------------------------------------------------------------------------------------------------------------------------
>> INFO: 28.761% skipped events (27674 including 2607 raise, 25067 entry, 0 exit)
>>
>> 4. Kwork timehist:
>>
>> # perf kwork timehist
>> Runtime start Runtime end Cpu Kwork name Runtime Delaytime
>> (TYPE)NAME:NUM (msec) (msec)
>> ----------------- ----------------- ------ ------------------------------ ---------- ----------
>> 121235.513572 121235.513674 [0001] (s)TIMER:1 0.102 0.465
>> 121235.513688 121235.513738 [0001] (s)SCHED:7 0.050 0.172
>> 121235.513750 121235.513777 [0001] (s)RCU:9 0.027 0.643
>> 121235.513929 121235.513973 [0001] (w)neigh_managed_work 0.044 0.315
>> 121235.520326 121235.520386 [0001] (s)RCU:9 0.060 5.832
>> 121235.520672 121235.520716 [0002] (s)SCHED:7 0.044 0.048
>> 121235.520729 121235.520753 [0002] (s)RCU:9 0.024 5.651
>> 121235.521213 121235.521249 [0005] (s)TIMER:1 0.036 0.064
>> 121235.520166 121235.521379 [0000] (s)SCHED:7 1.213 0.056
>> ... <SNIP> ...
>> 121235.533256 121235.533296 [0000] virtio0-requests:25 0.040
>> 121235.533322 121235.533359 [0000] (s)SCHED:7 0.037 0.095
>> 121235.533018 121235.533452 [0006] (s)RCU:9 0.434 0.348
>> 121235.534653 121235.534698 [0000] virtio0-requests:25 0.046
>> 121235.535657 121235.535702 [0000] virtio0-requests:25 0.044
>> 121235.535857 121235.535916 [0005] (s)TIMER:1 0.059 0.055
>> 121235.535927 121235.535947 [0005] (s)RCU:9 0.020 0.113
>> 121235.536178 121235.536196 [0006] (s)RCU:9 0.018 0.410
>> 121235.537406 121235.537445 [0006] (s)SCHED:7 0.039 0.049
>> 121235.537457 121235.537481 [0006] (s)RCU:9 0.024 0.334
>> 121235.538199 121235.538254 [0007] (s)RCU:9 0.055 0.066
>>
>> 5. Kwork report use bpf:
>>
>> # perf kwork report -b
>> Starting trace, Hit <Ctrl+C> to stop and report
>> ^C
>> Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
>> --------------------------------------------------------------------------------------------------------------------------------
>> (w)flush_to_ldisc | 0000 | 2.279 ms | 2 | 2.219 ms | 121293.080933 s | 121293.083152 s |
>> (s)SCHED:7 | 0001 | 2.141 ms | 2 | 2.100 ms | 121293.082064 s | 121293.084164 s |
>> (s)RCU:9 | 0003 | 2.137 ms | 3 | 2.046 ms | 121293.081348 s | 121293.083394 s |
>> (s)TIMER:1 | 0007 | 1.882 ms | 12 | 0.249 ms | 121295.632211 s | 121295.632460 s |
>> (w)e1000_watchdog | 0002 | 1.136 ms | 3 | 0.428 ms | 121294.496559 s | 121294.496987 s |
>> (s)SCHED:7 | 0007 | 0.995 ms | 12 | 0.139 ms | 121295.632483 s | 121295.632621 s |
>> (s)NET_RX:3 | 0002 | 0.727 ms | 5 | 0.391 ms | 121299.044624 s | 121299.045016 s |
>> (s)TIMER:1 | 0002 | 0.696 ms | 5 | 0.164 ms | 121294.496172 s | 121294.496337 s |
>> (s)SCHED:7 | 0002 | 0.427 ms | 6 | 0.077 ms | 121295.840321 s | 121295.840398 s |
>> (s)SCHED:7 | 0000 | 0.366 ms | 3 | 0.156 ms | 121296.545389 s | 121296.545545 s |
>> eth0:10 | 0002 | 0.353 ms | 5 | 0.122 ms | 121293.084796 s | 121293.084919 s |
>> (w)flush_to_ldisc | 0000 | 0.298 ms | 1 | 0.298 ms | 121299.046236 s | 121299.046534 s |
>> (w)mix_interrupt_randomness | 0002 | 0.215 ms | 4 | 0.077 ms | 121293.086747 s | 121293.086823 s |
>> (s)RCU:9 | 0002 | 0.128 ms | 3 | 0.060 ms | 121293.087348 s | 121293.087409 s |
>> (w)vmstat_shepherd | 0000 | 0.098 ms | 1 | 0.098 ms | 121293.083901 s | 121293.083999 s |
>> (s)TIMER:1 | 0001 | 0.089 ms | 1 | 0.089 ms | 121293.085709 s | 121293.085798 s |
>> (w)vmstat_update | 0003 | 0.071 ms | 1 | 0.071 ms | 121293.085227 s | 121293.085298 s |
>> (w)wq_barrier_func | 0000 | 0.064 ms | 1 | 0.064 ms | 121293.083688 s | 121293.083752 s |
>> (w)vmstat_update | 0000 | 0.041 ms | 1 | 0.041 ms | 121293.083829 s | 121293.083869 s |
>> (s)RCU:9 | 0001 | 0.038 ms | 1 | 0.038 ms | 121293.085818 s | 121293.085856 s |
>> (s)RCU:9 | 0007 | 0.035 ms | 1 | 0.035 ms | 121293.112322 s | 121293.112357 s |
>> --------------------------------------------------------------------------------------------------------------------------------
>>
>> 6. Kwork latency use bpf:
>>
>> # perf kwork latency -b
>> Starting trace, Hit <Ctrl+C> to stop and report
>> ^C
>> Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
>> --------------------------------------------------------------------------------------------------------------------------------
>> (w)vmstat_shepherd | 0000 | 2.044 ms | 2 | 2.764 ms | 121314.942758 s | 121314.945522 s |
>> (w)flush_to_ldisc | 0000 | 1.008 ms | 1 | 1.008 ms | 121317.335508 s | 121317.336516 s |
>> (w)vmstat_update | 0002 | 0.879 ms | 1 | 0.879 ms | 121317.024405 s | 121317.025284 s |
>> (w)mix_interrupt_randomness | 0002 | 0.328 ms | 5 | 0.383 ms | 121308.832944 s | 121308.833327 s |
>> (w)e1000_watchdog | 0002 | 0.304 ms | 5 | 0.368 ms | 121317.024305 s | 121317.024673 s |
>> (s)RCU:9 | 0001 | 0.172 ms | 41 | 0.728 ms | 121308.308187 s | 121308.308915 s |
>> (s)TIMER:1 | 0000 | 0.149 ms | 3 | 0.195 ms | 121317.334255 s | 121317.334449 s |
>> (s)NET_RX:3 | 0001 | 0.143 ms | 40 | 1.213 ms | 121315.030992 s | 121315.032205 s |
>> (s)RCU:9 | 0002 | 0.139 ms | 27 | 0.187 ms | 121315.077388 s | 121315.077576 s |
>> (s)NET_RX:3 | 0002 | 0.130 ms | 7 | 0.283 ms | 121308.832917 s | 121308.833201 s |
>> (s)SCHED:7 | 0007 | 0.123 ms | 34 | 0.191 ms | 121308.736240 s | 121308.736431 s |
>> (s)TIMER:1 | 0007 | 0.116 ms | 18 | 0.145 ms | 121308.736168 s | 121308.736313 s |
>> (s)RCU:9 | 0007 | 0.111 ms | 68 | 0.318 ms | 121308.736194 s | 121308.736512 s |
>> (s)SCHED:7 | 0002 | 0.110 ms | 22 | 0.292 ms | 121308.832197 s | 121308.832489 s |
>> (s)TIMER:1 | 0001 | 0.107 ms | 1 | 0.107 ms | 121314.948230 s | 121314.948337 s |
>> (w)neigh_managed_work | 0001 | 0.103 ms | 1 | 0.103 ms | 121314.948381 s | 121314.948484 s |
>> (s)RCU:9 | 0000 | 0.099 ms | 49 | 0.289 ms | 121308.520167 s | 121308.520456 s |
>> (s)NET_RX:3 | 0007 | 0.096 ms | 40 | 1.227 ms | 121315.022994 s | 121315.024220 s |
>> (s)RCU:9 | 0003 | 0.093 ms | 37 | 0.261 ms | 121314.950651 s | 121314.950913 s |
>> (w)flush_to_ldisc | 0000 | 0.090 ms | 1 | 0.090 ms | 121317.336737 s | 121317.336827 s |
>> (s)TIMER:1 | 0002 | 0.078 ms | 36 | 0.115 ms | 121310.880172 s | 121310.880288 s |
>> (s)SCHED:7 | 0001 | 0.071 ms | 27 | 0.180 ms | 121314.953571 s | 121314.953751 s |
>> (s)SCHED:7 | 0000 | 0.066 ms | 28 | 0.344 ms | 121317.334345 s | 121317.334689 s |
>> (s)SCHED:7 | 0003 | 0.063 ms | 14 | 0.119 ms | 121314.978808 s | 121314.978927 s |
>> --------------------------------------------------------------------------------------------------------------------------------
>>
>> 7. Kwork report with filter:
>>
>> # perf kwork report -b -n RCU
>> Starting trace, Hit <Ctrl+C> to stop and report
>> ^C
>> Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
>> --------------------------------------------------------------------------------------------------------------------------------
>> (s)RCU:9 | 0006 | 2.266 ms | 3 | 2.158 ms | 121335.008290 s | 121335.010449 s |
>> (s)RCU:9 | 0002 | 0.158 ms | 3 | 0.063 ms | 121335.011914 s | 121335.011977 s |
>> (s)RCU:9 | 0007 | 0.082 ms | 1 | 0.082 ms | 121335.448378 s | 121335.448460 s |
>> (s)RCU:9 | 0000 | 0.058 ms | 1 | 0.058 ms | 121335.011350 s | 121335.011408 s |
>> --------------------------------------------------------------------------------------------------------------------------------
>>
>> ---
>> Changes since v2:
>> - Updage commit messages.
>>
>> Changes since v1:
>> - Add options and document when actually add the functionality later.
>> - Replace "cluster" with "work".
>> - Add workqueue symbolizing function support.
>> - Replace "frequency" with "count" in report header.
>> - Add bpf trace support.
>>
>> Yang Jihong (17):
>> perf kwork: New tool
>> perf kwork: Add irq kwork record support
>> perf kwork: Add softirq kwork record support
>> perf kwork: Add workqueue kwork record support
>> tools lib: Add list_last_entry_or_null
>> perf kwork: Implement perf kwork report
>> perf kwork: Add irq report support
>> perf kwork: Add softirq report support
>> perf kwork: Add workqueue report support
>> perf kwork: Implement perf kwork latency
>> perf kwork: Add softirq latency support
>> perf kwork: Add workqueue latency support
>> perf kwork: Implement perf kwork timehist
>> perf kwork: Implement bpf trace
>> perf kwork: Add irq trace bpf support
>> perf kwork: Add softirq trace bpf support
>> perf kwork: Add workqueue trace bpf support
>>
>> tools/include/linux/list.h | 11 +
>> tools/perf/Build | 1 +
>> tools/perf/Documentation/perf-kwork.txt | 180 ++
>> tools/perf/Makefile.perf | 1 +
>> tools/perf/builtin-kwork.c | 1834 ++++++++++++++++++++
>> tools/perf/builtin.h | 1 +
>> tools/perf/command-list.txt | 1 +
>> tools/perf/perf.c | 1 +
>> tools/perf/util/Build | 1 +
>> tools/perf/util/bpf_kwork.c | 356 ++++
>> tools/perf/util/bpf_skel/kwork_trace.bpf.c | 381 ++++
>> tools/perf/util/kwork.h | 257 +++
>> 12 files changed, 3025 insertions(+)
>> create mode 100644 tools/perf/Documentation/perf-kwork.txt
>> create mode 100644 tools/perf/builtin-kwork.c
>> create mode 100644 tools/perf/util/bpf_kwork.c
>> create mode 100644 tools/perf/util/bpf_skel/kwork_trace.bpf.c
>> create mode 100644 tools/perf/util/kwork.h
>>
>> --
>> 2.30.GIT
>
Powered by blists - more mailing lists