[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrIhPuFee8R9ZvVi@x1>
Date: Tue, 6 Aug 2024 10:12:30 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>, Kan Liang <kan.liang@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 5/5] perf annotate: Add --skip-empty option
On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > output. This is consistent when the option is set in perf report.
> > > > >
> > > > > For example, the following command would use 3 events including dummy.
> > > > >
> > > > > $ perf mem record -a -- perf test -w noploop
> > > > >
> > > > > $ perf evlist
> > > > > cpu/mem-loads,ldlat=30/P
> > > > > cpu/mem-stores/P
> > > > > dummy:u
> > > > >
> > > > > Just using perf annotate with --group will show the all 3 events.
> > > >
> > > > Seems unrelated, just before compiling with this patch:
> > > >
> > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > Percent 0xffffffff810511e0 <sched_clock>:
> > > > endbr64
> > > > 5.76 incl pcpu_hot+0x8
> > > > 5.47 → callq sched_clock_noinstr
> > > > 88.78 decl pcpu_hot+0x8
> > > > ↓ je 1e
> > > > → jmp __x86_return_thunk
> > > > 1e: → callq __SCT__preempt_schedule_notrace
> > > > → jmp __x86_return_thunk
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > root@x1:~# perf annotate --group sched_clock
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf evlist
> > > > cpu_atom/mem-loads,ldlat=30/P
> > > > cpu_atom/mem-stores/P
> > > > dummy:u
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf report --header-only | grep cmdline
> > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > > root@x1:~#
> > > >
> > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > related, seems like it stops if the first event has no samples? Because
> > > > it works with another symbol:
> > >
> > > Good catch. Yeah I found it only checked the first event. Something
> > > like below should fix the issue.
> >
> > Nope, with the patch applied:
> >
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --stdio sched_clock
> > Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > -------------------------------------------------------------------------------------------------------------------
> > : 0 0xffffffff810511e0 <sched_clock>:
> > 0.00 : ffffffff810511e0: endbr64
> > 5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> > 0.13 : ffffffff810511eb: callq 0xffffffff821350d0
> > 94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> > 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> > 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> > 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> > 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
>
> Oh ok, it was not enough. It should call evsel__output_resort() after
> hists__match() and hists__link(). Use this instead.
Ok, this works:
Before this patch:
root@x1:~# perf annotate --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 : ffffffff810511e0: endbr64
5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.13 : ffffffff810511eb: callq 0xffffffff821350d0
94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~#
After:
root@x1:~# perf annotate --group --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 0.00 0.00 : ffffffff810511e0: endbr64
0.00 5.11 0.00 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.00 0.13 0.00 : ffffffff810511eb: callq 0xffffffff821350d0
0.00 94.76 0.00 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 0.00 0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 0.00 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 0.00 0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 0.00 0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~#
One example with samples for the first two events:
root@x1:~# perf annotate --group --stdio2
Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
0.00 0.24 0.00 endbr64
→ callq __fentry__
0.00 99.76 0.00 pushq %r15
movq $0x251d4,%rcx
pushq %r14
movq %rdi,%r14
pushq %r13
movslq %esi,%r13
pushq %r12
pushq %rbp
pushq %rbx
subq $0x10,%rsp
cmpq $0x2000,%r13
↓ jae 17f
31: movq 0x3d0(%r14),%rbx
movq -0x7d3fb360(, %r13, 8),%r12
cmpq $0x2000,%r13
↓ jae 19b
25.00 0.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
↓ je 6b
addq $0x10,%rsp
popq %rbx
popq %rbp
popq %r12
75.00 0.00 0.00 popq %r13
popq %r14
popq %r15
→ jmp __x86_return_thunk
<SNIP>
And then skipping "empty" events:
root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
Samples: 4 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
0.00 0.24 endbr64
→ callq __fentry__
0.00 99.76 pushq %r15
movq $0x251d4,%rcx
pushq %r14
movq %rdi,%r14
pushq %r13
movslq %esi,%r13
pushq %r12
pushq %rbp
pushq %rbx
subq $0x10,%rsp
cmpq $0x2000,%r13
↓ jae 17f
31: movq 0x3d0(%r14),%rbx
movq -0x7d3fb360(, %r13, 8),%r12
cmpq $0x2000,%r13
↓ jae 19b
25.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
↓ je 6b
addq $0x10,%rsp
popq %rbx
popq %rbp
popq %r12
75.00 0.00 popq %r13
popq %r14
popq %r15
→ jmp __x86_return_thunk
6b: addq %r12,%rcx
movq %rcx,%rdi
movq %rcx,(%rsp)
→ callq *ffffffff82151500
root@x1:~#
So, I haven't done further analysis but I think this is a separate
issue, right?
Thanks for the fix!
Reported-by: Arnaldo Carvalho de Melo <acme@...hat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@...hat.com>
- Arnaldo
Powered by blists - more mailing lists