linux-kernel - Re: [PATCH] perf trace: Add support for printing call chains on sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160408181853.GC25165@kernel.org>
Date:	Fri, 8 Apr 2016 15:18:53 -0300
From:	Arnaldo Carvalho de Melo <acme@...nel.org>
To:	Milian Wolff <milian.wolff@...b.com>
Cc:	linux-perf-users@...r.kernel.org, Jiri Olsa <jolsa@...nel.org>,
	David Ahern <dsahern@...il.com>,
	Brendan Gregg <brendan.d.gregg@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Wang Nan <wangnan0@...wei.com>
Subject: Re: [PATCH] perf trace: Add support for printing call chains on
 sys_exit events.

Em Fri, Apr 08, 2016 at 02:57:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Apr 08, 2016 at 01:34:15PM +0200, Milian Wolff escreveu:
> > Now, one can print the call chain for every encountered sys_exit
> > event, e.g.:

> > Note that it is advised to increase the number of mmap pages to
> > prevent event losses when using this new feature. Often, adding
> > `-m 10M` to the `perf trace` invocation is enough.

> > This feature is also available in strace when built with libunwind
> > via `strace -k`. Performance wise, this solution is much better:

> >     $ time find path/to/linux &> /dev/null

> >     real    0m0.051s
> >     user    0m0.013s
> >     sys     0m0.037s

> >     $ time perf trace -m 800M --call-graph dwarf find path/to/linux &>
> >     /dev/null

> >     real    0m2.624s
> >     user    0m1.203s
> >     sys     0m1.333s

> >     $ time strace -k find path/to/linux  &> /dev/null
 
> >     real    0m35.398s
> >     user    0m10.403s
> >     sys     0m23.173s

> > Note that it is currently not possible to configure the print output.
> > Adding such a feature, similar to what is available in `perf script`
> > via its `--fields` knob can be added later on.
 
> You mixed up multiple changes in one single patch, I'll break it down
> while testing, and before pushing upstream.

Expanding a bit the audience:

First test, it works, great! But do we really need that address? I guess not,
right, perhaps via some callchain parameter, to tell what we want to see? But
by default knowing the function name + DSO seems enough, no?

[root@...et bpf]# trace -e nanosleep --call-graph dwarf usleep 1
     0.071 ( 0.071 ms): usleep/5455 nanosleep(rqtp: 0x7ffee070f080) = 0
	          2036be syscall_slow_exit_work ([kernel.kallsyms])
	          203dfb do_syscall_64 ([kernel.kallsyms])
	          9b8fe1 return_from_SYSCALL_64 ([kernel.kallsyms])
	    7f41622ec790 __nanosleep (/usr/lib64/libc-2.22.so)
	    7f416231d524 usleep (/usr/lib64/libc-2.22.so)
	    563b6c6afcab [unknown] (/usr/bin/usleep)
	    7f4162244580 __libc_start_main (/usr/lib64/libc-2.22.so)
	    563b6c6afce9 [unknown] (/usr/bin/usleep)
[root@...et bpf]#

Yeah, you agree with that, now that I read the patch 8-):

+               /* TODO: user-configurable print_opts */
+               unsigned int print_opts = PRINT_IP_OPT_IP


Ok, removing that OPT_IP I get, oops, the alignment is beign done only on ip?

[root@...et bpf]# trace -e nanosleep --call-graph dwarf usleep 1
     0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70                                        ) = 0
 syscall_slow_exit_work ([kernel.kallsyms])
 do_syscall_64 ([kernel.kallsyms])
 return_from_SYSCALL_64 ([kernel.kallsyms])
 __nanosleep (/usr/lib64/libc-2.22.so)
 usleep (/usr/lib64/libc-2.22.so)
 [unknown] (/usr/bin/usleep)
 __libc_start_main (/usr/lib64/libc-2.22.so)
 [unknown] (/usr/bin/usleep)
[root@...et bpf]# 

Fixing it up we get:

[root@...et bpf]# trace -e nanosleep --call-graph dwarf usleep 1
     0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70                                        ) = 0
                                    syscall_slow_exit_work ([kernel.kallsyms])
                                    do_syscall_64 ([kernel.kallsyms])
                                    return_from_SYSCALL_64 ([kernel.kallsyms])
                                    __nanosleep (/usr/lib64/libc-2.22.so)
                                    usleep (/usr/lib64/libc-2.22.so)
                                    [unknown] (/usr/bin/usleep)
                                    __libc_start_main (/usr/lib64/libc-2.22.so)
                                    [unknown] (/usr/bin/usleep)
[root@...et bpf]# 

Better, but perhaps we should try aligning, up to a limit, the function names/DSOs?

[root@...et bpf]# trace -e nanosleep --call-graph dwarf usleep 1
     0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70                                        ) = 0
                                    syscall_slow_exit_work ([kernel.kallsyms])
                                    do_syscall_64          ([kernel.kallsyms])
                                    return_from_SYSCALL_64 ([kernel.kallsyms])
                                    __nanosleep            (/usr/lib64/libc-2.22.so)
                                    usleep                 (/usr/lib64/libc-2.22.so)
                                    [unknown]              (/usr/bin/usleep)
                                    __libc_start_main      (/usr/lib64/libc-2.22.so)
                                    [unknown]              (/usr/bin/usleep)
[root@...et bpf]# 

wdyt?

Also, after this initial support is in, I think the next step is to
allow per syscall configs, like we have for per tracepoints, i.e. this
should be possible:

   # trace -e nanosleep(call-graph=dwarf),socket -a

And then we would get callchains just for nanosleep calls, not for
socket ones. We then need to think how to ask that efficiently to the
kernel, in this case it should be instead of using
raw_syscalls:sys_enter + tracepoint filters set via ioctl, to use
syscalls:sys_{enter,exit}_nanosleep, with callgraphs +
syscalls:sys_{enter,exit}_socket, without.

Doing it this way allows us to avoid asking callchains for a lot of
events when we want just for a few ones, to reduce overhead.

Anyway, I think I'll just break this down into multiple patches and then
we can work on these other aspects.

David, ah, his patch floated on the linux-perf-users mailing list, easy
one once the thread->priv one got out of the way (it was being used by
builtin-trace.c and the unwind code, ugh).

Thanks,

- Arnaldo