linux-kernel - RE: [PATCH RFC V9 2/3] perf,tools: per-event callgraph support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F077018DE420@SHSMSX103.ccr.corp.intel.com>
Date:	Mon, 10 Aug 2015 12:56:04 +0000
From:	"Liang, Kan" <kan.liang@...el.com>
To:	Arnaldo Carvalho de Melo <acme@...nel.org>
CC:	Jiri Olsa <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
	"Andi Kleen" <ak@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH RFC V9 2/3] perf,tools: per-event callgraph support



> Em Fri, Aug 07, 2015 at 12:38:43PM -0300, Arnaldo Carvalho de Melo
> escreveu:
> > Em Thu, Aug 06, 2015 at 03:44:53PM -0400, kan.liang@...el.com escreveu:
> > > From: Kan Liang <kan.liang@...el.com> Here are some examples and
> > > test results.
> > >
> > > 1. Comparing the elapsed time and perf.data size from "kernbench -M -
> H".
> > >
> > >  The test command for FULL callgraph and time support.
> > >    "perf record -e
> > >    '{cpu/cpu-cycles,period=100000/,cpu/instructions,period=20000/p}'
> > >    --call-graph fp --time"
> 
> Some other oddity I noticed while testing:
> 
> For:
> 
> [root@zoo ~]# perf report --header-only
> # ========
> # captured on: Fri Aug  7 12:39:13 2015
> # hostname : zoo
> # os release : 4.2.0-rc5+
> # perf version : 4.2.rc4.g1cd951
> # arch : x86_64
> # nrcpus online : 4
> # nrcpus avail : 4
> # cpudesc : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz # cpuid :
> GenuineIntel,6,58,9 # total memory : 8080692 kB # cmdline :
> /home/acme/bin/perf record -e {cpu/cpu-cycles,call-
> graph=fp,time,period=10000/pp,cpu/instructions,call-
> graph=no,time=0,period=20000/p} -a # event : name = cpu/cpu-cycles,call-
> graph=fp,time,period=10000/pp, , id = { 38537, 38538, 38539, 38540 }, type
> = 4, size = 112, config = 0x3c, { sample_perio # event : name =
> cpu/instructions,call-graph=no,time=0,period=20000/p, , id = { 38541,
> 38542, 38543, 38544 }, type = 4, size = 112, config = 0xc0, { sample_pe #
> HEADER_CPU_TOPOLOGY info available, use -I to display #
> HEADER_NUMA_TOPOLOGY info available, use -I to display # pmu
> mappings: cpu = 4, software = 1, power = 7, uncore_imc = 8, tracepoint = 2,
> uncore_cbox_0 = 9, uncore_cbox_1 = 10, breakpoint = 5 # group:
> {cpu/cpu-cycles,call-graph=fp,time,period=10000/pp,cpu/instructions,call-
> graph=no,time=0,period=20000/p}
> # ========
> #
> 
> I get:
> 
> Samples: 2K of event 'cpu/instructions,call-
> graph=no,time=0,period=20000/p', Event count (approx.): 46956518
>   Children      Self  Command          Shared Object Symbol                  ◆
> -   67.56%     0.00%  qemu-system-x86  [unknown]     [.]
> 0xad5e258d4c544155  ▒
>      0xad5e258d4c544155                                                      ▒
> -   67.56%     0.00%  qemu-system-x86  libc-2.20.so  [.] __libc_start_main   ▒
>      __libc_start_main                                                       ▒
>      0xad5e258d4c544155                                                      ▒
> -   67.56%     0.00%  qemu-system-x86  perf          [.] main                ▒
>      main                                                                    ▒
>      __libc_start_main                                                       ▒
>      0xad5e258d4c544155                                                      ▒
> -   67.56%     0.00%  qemu-system-x86  perf          [.] run_builtin         ▒
>      run_builtin                                                             ▒
>      main                                                                    ▒
>      __libc_start_main                                                       ▒
> 
> This is in the 'perf report' TUI, why, for an event with 'callgraph=no', we get
> callchains? How come?
> 

That's the design.
For sampling multiple events, it may not be needed to collect callgraphs for
all of them. Because the sample sites are usually nearby. It's enough to collect
the callgraphs on a reference event. For other events, it can still show callgraphs
according to the callgraphs on a reference event.

Thanks,
Kan

> If I try it with the --stdio, well, its there as well:
> 
> # Samples: 2K of event 'cpu/instructions,call-
> graph=no,time=0,period=20000/p'
> # Event count (approx.): 46956518
> #
> # Children      Self  Command          Shared Object               Symbol
> # ........  ........  ...............  ..........................  ..............................................
> #
>     67.56%     0.00%  qemu-system-x86  [unknown]                   [.]
> 0xad5e258d4c544155
>             |
>             ---0xad5e258d4c544155
> 
>     67.56%     0.00%  qemu-system-x86  libc-2.20.so                [.]
> __libc_start_main
>             |
>             ---__libc_start_main
>                0xad5e258d4c544155
> 
>     67.56%     0.00%  qemu-system-x86  perf                        [.] main
>             |
>             ---main
>                __libc_start_main
>                0xad5e258d4c544155
> 
>     67.56%     0.00%  qemu-system-x86  perf                        [.] run_builtin
>             |
>             ---run_builtin
>                main
>                __libc_start_main
>                0xad5e258d4c544155
> 
> Can you take a look? What I have, i.e. the patches from Jiri and Kan moving
> stuff out of the way of the python binding and this patch is at the
> tmp.perf/core branch in my git tree:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
> 
> - Arnaldo
> 
> 
> > Jiri, while testing this I noticed that the message for EINVAL when
> > using the cpu// syntax (per-event settings) is cryptic:
> >
> >   [root@zoo ~]# perf record -e 'cpu/cpu-cycles,call-
> graph=fp,time,period=100000/p' ls
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (cpu/cpu-cycles,call-graph=fp,time,period=100000/p).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> >
> > Whereas if we use -F, it is much, much clearer, telling the user
> > exactly what is failing and what needs to be done to make it work:
> >
> >   [root@zoo ~]# perf record -F 100000 -e cpu/cpu-cycles/ usleep 1
> >   Maximum frequency rate (25000) reached.
> >   Please use -F freq option with lower value or consider
> >   tweaking /proc/sys/kernel/perf_event_max_sample_rate.
> >   [root@zoo ~]#
> >
> > Hope this is something easy to wire up, given your event parsing kung
> > foo skillz...
> >
> > ;-)
> >
> > - Arnaldo