lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM9d7cjXmaMuidQR10PXrp9khZ4LhDZbLno1rN2JcCncaYyp7Q@mail.gmail.com>
Date: Tue, 6 Aug 2024 23:12:51 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, 
	LKML <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 5/5] perf annotate: Add --skip-empty option

On Tue, Aug 6, 2024 at 6:12 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > > output.  This is consistent when the option is set in perf report.
> > > > > >
> > > > > > For example, the following command would use 3 events including dummy.
> > > > > >
> > > > > >   $ perf mem record -a -- perf test -w noploop
> > > > > >
> > > > > >   $ perf evlist
> > > > > >   cpu/mem-loads,ldlat=30/P
> > > > > >   cpu/mem-stores/P
> > > > > >   dummy:u
> > > > > >
> > > > > > Just using perf annotate with --group will show the all 3 events.
> > > > >
> > > > > Seems unrelated, just before compiling with this patch:
> > > > >
> > > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > > Percent      0xffffffff810511e0 <sched_clock>:
> > > > >                endbr64
> > > > >    5.76        incl    pcpu_hot+0x8
> > > > >    5.47      → callq   sched_clock_noinstr
> > > > >   88.78        decl    pcpu_hot+0x8
> > > > >              ↓ je      1e
> > > > >              → jmp     __x86_return_thunk
> > > > >          1e: → callq   __SCT__preempt_schedule_notrace
> > > > >              → jmp     __x86_return_thunk
> > > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > > root@x1:~# perf annotate --group sched_clock
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf evlist
> > > > > cpu_atom/mem-loads,ldlat=30/P
> > > > > cpu_atom/mem-stores/P
> > > > > dummy:u
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf report --header-only | grep cmdline
> > > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > > > root@x1:~#
> > > > >
> > > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > > related, seems like it stops if the first event has no samples? Because
> > > > > it works with another symbol:
> > > >
> > > > Good catch.  Yeah I found it only checked the first event.  Something
> > > > like below should fix the issue.
> > >
> > > Nope, with the patch applied:
> > >
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --stdio sched_clock
> > >  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > > -------------------------------------------------------------------------------------------------------------------
> > >          : 0                0xffffffff810511e0 <sched_clock>:
> > >     0.00 :   ffffffff810511e0:       endbr64
> > >     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
> > >     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
> > >    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
> > >     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
> > >     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
> > >     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
> > >     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --group sched_clock
> > > root@x1:~#
> >
> > Oh ok, it was not enough.  It should call evsel__output_resort() after
> > hists__match() and hists__link().  Use this instead.
>
> Ok, this works:
>
> Before this patch:
>
> root@x1:~# perf annotate --stdio sched_clock
>  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------
>          : 0                0xffffffff810511e0 <sched_clock>:
>     0.00 :   ffffffff810511e0:       endbr64
>     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
>    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
>     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
>     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
>     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~#
>
> After:
>
> root@x1:~# perf annotate --group --stdio sched_clock
>  Percent                 |      Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                          : 0                0xffffffff810511e0 <sched_clock>:
>     0.00    0.00    0.00 :   ffffffff810511e0:       endbr64
>     0.00    5.11    0.00 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00    0.13    0.00 :   ffffffff810511eb:       callq   0xffffffff821350d0
>     0.00   94.76    0.00 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00    0.00    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
>     0.00    0.00    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
>     0.00    0.00    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
>     0.00    0.00    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> root@x1:~#
>
> One example with samples for the first two events:
>
> root@x1:~# perf annotate --group --stdio2
> Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
> cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent                       0xffffffff8124e080 <cgroup_rstat_updated>:
>    0.00    0.24    0.00         endbr64
>                               → callq   __fentry__
>    0.00   99.76    0.00         pushq   %r15
>                                 movq    $0x251d4,%rcx
>                                 pushq   %r14
>                                 movq    %rdi,%r14
>                                 pushq   %r13
>                                 movslq  %esi,%r13
>                                 pushq   %r12
>                                 pushq   %rbp
>                                 pushq   %rbx
>                                 subq    $0x10,%rsp
>                                 cmpq    $0x2000,%r13
>                               ↓ jae     17f
>                           31:   movq    0x3d0(%r14),%rbx
>                                 movq    -0x7d3fb360(, %r13, 8),%r12
>                                 cmpq    $0x2000,%r13
>                               ↓ jae     19b
>   25.00    0.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
>                               ↓ je      6b
>                                 addq    $0x10,%rsp
>                                 popq    %rbx
>                                 popq    %rbp
>                                 popq    %r12
>   75.00    0.00    0.00         popq    %r13
>                                 popq    %r14
>                                 popq    %r15
>                               → jmp     __x86_return_thunk
> <SNIP>
>
> And then skipping "empty" events:
>
> root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
> Samples: 4  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
> cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent               0xffffffff8124e080 <cgroup_rstat_updated>:
>    0.00    0.24         endbr64
>                       → callq   __fentry__
>    0.00   99.76         pushq   %r15
>                         movq    $0x251d4,%rcx
>                         pushq   %r14
>                         movq    %rdi,%r14
>                         pushq   %r13
>                         movslq  %esi,%r13
>                         pushq   %r12
>                         pushq   %rbp
>                         pushq   %rbx
>                         subq    $0x10,%rsp
>                         cmpq    $0x2000,%r13
>                       ↓ jae     17f
>                   31:   movq    0x3d0(%r14),%rbx
>                         movq    -0x7d3fb360(, %r13, 8),%r12
>                         cmpq    $0x2000,%r13
>                       ↓ jae     19b
>   25.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
>                       ↓ je      6b
>                         addq    $0x10,%rsp
>                         popq    %rbx
>                         popq    %rbp
>                         popq    %r12
>   75.00    0.00         popq    %r13
>                         popq    %r14
>                         popq    %r15
>                       → jmp     __x86_return_thunk
>                   6b:   addq    %r12,%rcx
>                         movq    %rcx,%rdi
>                         movq    %rcx,(%rsp)
>                       → callq   *ffffffff82151500
> root@x1:~#
>
> So, I haven't done further analysis but I think this is a separate
> issue, right?

Yep, it's not related to --skip-empty.

>
> Thanks for the fix!
>
> Reported-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@...hat.com>

Will send a fix with your tags.

Thanks,
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ