lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Nov 2015 08:48:40 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Namhyung Kim <namhyung@...nel.org>
Cc:	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Jiri Olsa <jolsa@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	David Ahern <dsahern@...il.com>,
	Kan Liang <kan.liang@...el.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Andi Kleen <andi@...stfloor.org>,
	Wang Nan <wangnan0@...wei.com>
Subject: Re: [PATCH 2/3] perf callchain: Stop resolving callchains after
 invalid address


* Namhyung Kim <namhyung@...nel.org> wrote:

> Hi Ingo,
> 
> On Thu, Nov 26, 2015 at 08:43:35AM +0100, Ingo Molnar wrote:
> > 
> > * Namhyung Kim <namhyung@...nel.org> wrote:
> > 
> > > Unwinding optimized binaries using frame pointer gives garbage.  Check
> > > callchain address and stop if it's under vm.mmap_min_addr sysctl value.
> > > 
> > > Before:
> > >   $ perf report --stdio --no-children -g callee
> > >   ...
> > > 
> > >    1.37%  perf    [kernel.vmlinux]    [k] smp_call_function_single
> > >                |
> > >                ---smp_call_function_single
> > >                   _perf_event_enable
> > >                   perf_event_for_each_child
> > >                   perf_ioctl
> > >                   do_vfs_ioctl
> > >                   sys_ioctl
> > >                   entry_SYSCALL_64_fastpath
> > >                   __GI___ioctl
> > >                   0
> > >                   0
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > > 		  ...
> > > 
> > > After:
> > >   $ perf report --stdio --no-children -g callee
> > >   ...
> > > 
> > >    1.37%  perf    [kernel.vmlinux]    [k] smp_call_function_single
> > >                |
> > >                ---smp_call_function_single
> > >                   _perf_event_enable
> > >                   perf_event_for_each_child
> > >                   perf_ioctl
> > >                   do_vfs_ioctl
> > >                   sys_ioctl
> > >                   entry_SYSCALL_64_fastpath
> > >                   __GI___ioctl
> > 
> > In addition to that, would it make sense to terminate the callchain with an 
> > indicator that we found something anomalous? Such an extra line:
> > 
> >                     ...
> > 
> > would not be intrusive, but would tell the informed reader that it's not a normal 
> > ending of the call chain.
> > 
> > This assumes that we can tell apart 'normal end of call chain' from 'seems to end 
> > with garbage poiner' cases - can do we that?
> 
> In case of fp unwind, I'm not sure we can determine whether it's
> normal end or not especially for optimized binaries.  It seems kernel
> also can stop callchain anytime if it sees a broken frame.
> 
> For dwarf unwind, I think it's also hard to tell since it can be
> stopped for various reasons like insufficient dump size or broken CFI,

But but. Doesn't your patch 'detect' an anomaly to begin with?

+               /*
+                * Callchain value under mmap_min_addr means it's broken
+                * or the end of callchain.  Stop.
+                */
+               if (ip < mmap_min_addr) {
+                       if (callchain_param.order == ORDER_CALLEE)
+                               break;

all I'm asking for is to indicate it in some low-key visual fashion when we 
encounter such a 'broken' call-chain.

I presume the 'old' way of ending the call-chain was that 'ip' was zero, right? We 
should not print the indicator in that case.

Also, in the dwarf case I'd also see value in indicating if any of these events 
occured:

  > For dwarf unwind, I think it's also hard to tell since it can be stopped for 
  > various reasons like insufficient dump size or broken CFI,

even if we cannot catch all anomalies. Performance analysis must stand firm on a 
hard rock of reliability and dependability, and we should always propagate 
information about possible profiling data corruption/unreliability. That's why we 
print the 'IO overload' messages during perf record for example.

Even if the problem is not caused by perf, but by external factors such as the 
compiler/linker.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ