lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 26 Apr 2016 15:20:32 -0300
From:	Arnaldo Carvalho de Melo <acme@...nel.org>
To:	Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Stephane Eranian <eranian@...gle.com>,
	David Ahern <dsahern@...il.com>,
	Milian Wolff <milian.wolff@...b.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: LBR callchains from tracepoints

Em Tue, Apr 26, 2016 at 10:26:32AM -0700, Alexei Starovoitov escreveu:
> On Tue, Apr 26, 2016 at 06:38:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 25, 2016 at 10:24:31PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Apr 25, 2016 at 10:03:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > I now need to continue investigation why this doesn't seem to work from
> > > > tracepoints...

> > > Bummer, the changeset (at the end of this message) hasn't any
> > > explanation, is this really impossible? I.e. LBR callstacks from
> > > tracepoints? Even if we set perf_event_attr.exclude_callchain_kernel?

> > Could maybe be done, but its tricky to implement as the LBR is managed
> > by the hardware PMU and tracepoints are a software PMU, so we need to
> > then somehow frob with cross-pmu resources, in a very arch specific way.
> > And programmability of the hardware PMU will then depend on events
> > outside of it.
 
> btw we're thinking to add support for lbr to bpf, so that from the program
> we can get accurate and fast stacks. That's especially important for user
> space stacks. No clear idea how to do it yet, but it would be really useful.

Yeah, and that already works in perf, its just that it doesn't work from some
points (PERF_TYPE_SOFTWARE, PERF_TYPE_TRACEPOINT, etc), as described in the
changeset I mentioned.

'perf trace --call-graph lbr' doesn't work right now even with it
interested only in the user space bits, i.e. setting
perf_event_attr.exclude_callchain_kernel.

  # perf trace --call-graph dwarf

works, but that, as you mention, really isn't "fast" and sometimes not
accurate, or at least wasn't with broken toolchains.

Example of mixed strace-like with userspace-only DWARF callchains (would be
lovely if this was with LBR, huh?) plus fp callchains for the
sched:sched_switch tracepoint plus LBR callchains for a hardware event, cycles,
look further below for the reason of the broken timestamps for
PERF_TYPE_HARDWARE events:

  # perf trace -e nanosleep --event sched:sched_switch/call-graph=fp/ --ev cycles/call-graph=lbr,period=100/ usleep 1
18446744073709.551 (         ): cycles/call-graph=lbr,period=100/:)
                                       __intel_pmu_enable_all+0xfe200080 ([kernel.kallsyms])
                                       intel_pmu_enable_all+0xfe200010 ([kernel.kallsyms])
                                       x86_pmu_enable+0xfe200271 ([kernel.kallsyms])
                                       perf_pmu_enable.part.81+0xfe200007 ([kernel.kallsyms])
                                       ctx_resched+0xfe20007a ([kernel.kallsyms])
                                       perf_event_exec+0xfe20011d ([kernel.kallsyms])
                                       setup_new_exec+0xfe20006f ([kernel.kallsyms])
                                       load_elf_binary+0xfe2003e3 ([kernel.kallsyms])
                                       search_binary_handler+0xfe20009e ([kernel.kallsyms])
                                       do_execveat_common.isra.38+0xfe20052c ([kernel.kallsyms])
                                       sys_execve+0xfe20003a ([kernel.kallsyms])
                                       do_syscall_64+0xfe200062 ([kernel.kallsyms])
                                       return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                                       [0] ([unknown])
     0.310 ( 0.006 ms): usleep/20951 nanosleep(rqtp: 0x7ffda8904500       ) ...
     0.310 (         ): sched:sched_switch:usleep:20951 [120] S ==> swapper/3:0 [120])
                                       __schedule+0xfe200402 ([kernel.kallsyms])
                                       schedule+0xfe200035 ([kernel.kallsyms])
                                       do_nanosleep+0xfe20006f ([kernel.kallsyms])
                                       hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms])
                                       sys_nanosleep+0xfe20007a ([kernel.kallsyms])
                                       do_syscall_64+0xfe200062 ([kernel.kallsyms])
                                       return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                                       __nanosleep+0xffff00bfad62c010 (/usr/lib64/libc-2.22.so)
18446679523046.461 (         ): cycles/call-graph=lbr,period=100/:)
                                       perf_pmu_enable.part.81+0xfe200007 ([kernel.kallsyms])
                                       __perf_event_task_sched_in+0xfe2001ad ([kernel.kallsyms])
                                       finish_task_switch+0xfe200156 ([kernel.kallsyms])
                                       __schedule+0xfe200397 ([kernel.kallsyms])
                                       schedule+0xfe200035 ([kernel.kallsyms])
                                       do_nanosleep+0xfe20006f ([kernel.kallsyms])
                                       hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms])
                                       sys_nanosleep+0xfe20007a ([kernel.kallsyms])
                                       do_syscall_64+0xfe200062 ([kernel.kallsyms])
                                       return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                                       [0] ([unknown])
     0.377 ( 0.073 ms): usleep/20951  ... [continued]: nanosleep()) = 0
  [root@...et ~]# 


perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  size                             112
  { sample_period, sample_freq }   100
  sample_type                      IP|TID|CALLCHAIN|BRANCH_STACK|IDENTIFIER
  read_format                      ID
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  sample_id_all                    1
  exclude_guest                    1
  { wakeup_events, wakeup_watermark } 1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES

missing PERF_SAMPLE_TIME, will fix.

- Arnaldo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ