linux-kernel - Re: [PATCH] perf trace: Add support for printing call chains on sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1714692.A55s9sBUmI@agathebauer>
Date:	Sat, 09 Apr 2016 13:44:51 +0200
From:	Milian Wolff <milian.wolff@...b.com>
To:	Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:	linux-perf-users@...r.kernel.org, Jiri Olsa <jolsa@...nel.org>,
	David Ahern <dsahern@...il.com>,
	Brendan Gregg <brendan.d.gregg@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Wang Nan <wangnan0@...wei.com>
Subject: Re: [PATCH] perf trace: Add support for printing call chains on sys_exit events.

On Freitag, 8. April 2016 15:18:53 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Apr 08, 2016 at 02:57:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Apr 08, 2016 at 01:34:15PM +0200, Milian Wolff escreveu:
> > > Now, one can print the call chain for every encountered sys_exit
> > > event, e.g.:
> > > 
> > > Note that it is advised to increase the number of mmap pages to
> > > prevent event losses when using this new feature. Often, adding
> > > `-m 10M` to the `perf trace` invocation is enough.
> > > 
> > > This feature is also available in strace when built with libunwind
> > > 
> > > via `strace -k`. Performance wise, this solution is much better:
> > >     $ time find path/to/linux &> /dev/null
> > >     
> > >     real    0m0.051s
> > >     user    0m0.013s
> > >     sys     0m0.037s
> > >     
> > >     $ time perf trace -m 800M --call-graph dwarf find path/to/linux &>
> > >     /dev/null
> > >     
> > >     real    0m2.624s
> > >     user    0m1.203s
> > >     sys     0m1.333s
> > >     
> > >     $ time strace -k find path/to/linux  &> /dev/null
> > >     
> > >     real    0m35.398s
> > >     user    0m10.403s
> > >     sys     0m23.173s
> > > 
> > > Note that it is currently not possible to configure the print output.
> > > Adding such a feature, similar to what is available in `perf script`
> > > via its `--fields` knob can be added later on.
> > 
> > You mixed up multiple changes in one single patch, I'll break it down
> > while testing, and before pushing upstream.
> 
> Expanding a bit the audience:
> 
> First test, it works, great! But do we really need that address? I guess
> not, right, perhaps via some callchain parameter, to tell what we want to
> see? But by default knowing the function name + DSO seems enough, no?

...

> Yeah, you agree with that, now that I read the patch 8-):
> 
> +               /* TODO: user-configurable print_opts */
> +               unsigned int print_opts = PRINT_IP_OPT_IP

;-)

I even tried to make the code of `perf script` reusable for `perf trace`, but 
stopped once I realised that it currently relies on the existance of a 
`perf_session`, which does not exist when we do live tracing. It only exists 
for replaying in `builtin-trace.c`. So it involves some more refactoring which 
I did not have the time for.

<snip>

> Better, but perhaps we should try aligning, up to a limit, the function
> names/DSOs?
> 
> [root@...et bpf]# trace -e nanosleep --call-graph dwarf usleep 1
>      0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70          
>                              ) = 0 syscall_slow_exit_work
> ([kernel.kallsyms]) do_syscall_64          ([kernel.kallsyms])
> return_from_SYSCALL_64 ([kernel.kallsyms]) __nanosleep           
> (/usr/lib64/libc-2.22.so) usleep                 (/usr/lib64/libc-2.22.so)
> [unknown]              (/usr/bin/usleep) __libc_start_main     
> (/usr/lib64/libc-2.22.so) [unknown]              (/usr/bin/usleep)
> [root@...et bpf]#
> 
> wdyt?

Yes, sounds good. Many profilers I've worked with always dump the IP, so I 
thought we should do it here as well. `perf script` e.g. does it. Could we 
maybe print the IP if the symbol is [unknown]?

> Also, after this initial support is in, I think the next step is to
> allow per syscall configs, like we have for per tracepoints, i.e. this
> should be possible:
> 
>    # trace -e nanosleep(call-graph=dwarf),socket -a
> 
> And then we would get callchains just for nanosleep calls, not for
> socket ones. We then need to think how to ask that efficiently to the
> kernel, in this case it should be instead of using
> raw_syscalls:sys_enter + tracepoint filters set via ioctl, to use
> syscalls:sys_{enter,exit}_nanosleep, with callgraphs +
> syscalls:sys_{enter,exit}_socket, without.
> 
> Doing it this way allows us to avoid asking callchains for a lot of
> events when we want just for a few ones, to reduce overhead.
 
Yep, sounds useful for some more specific use-cases. For me, this patch is 
sufficient as I'd just do:

    $ trace -e nanosleep --call-graph=dwarf ...

What I think is more important though is to make sure we only ask for 
callchains on the sys_exit events. Afaik, my patch will do it also for the 
sys_enter which is just additional cost with no benefit? So fixing that first 
is I think even more important, but I don't know how.

> Anyway, I think I'll just break this down into multiple patches and then
> we can work on these other aspects.

Yes, but note that I'll be busy and then on vacation for the next two weeks. 
I'll get back to this after wards.

> David, ah, his patch floated on the linux-perf-users mailing list, easy
> one once the thread->priv one got out of the way (it was being used by
> builtin-trace.c and the unwind code, ugh).
> 
> Thanks,

Same to you, cheers!

-- 
Milian Wolff | milian.wolff@...b.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5903 bytes)