linux-kernel - Re: [PATCH 2/3] perf callchain: Stop resolving callchains after invalid address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151127074840.GA24277@gmail.com>
Date:	Fri, 27 Nov 2015 08:48:40 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Namhyung Kim <namhyung@...nel.org>
Cc:	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Jiri Olsa <jolsa@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	David Ahern <dsahern@...il.com>,
	Kan Liang <kan.liang@...el.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Andi Kleen <andi@...stfloor.org>,
	Wang Nan <wangnan0@...wei.com>
Subject: Re: [PATCH 2/3] perf callchain: Stop resolving callchains after
 invalid address


* Namhyung Kim <namhyung@...nel.org> wrote:

> Hi Ingo,
> 
> On Thu, Nov 26, 2015 at 08:43:35AM +0100, Ingo Molnar wrote:
> > 
> > * Namhyung Kim <namhyung@...nel.org> wrote:
> > 
> > > Unwinding optimized binaries using frame pointer gives garbage.  Check
> > > callchain address and stop if it's under vm.mmap_min_addr sysctl value.
> > > 
> > > Before:
> > >   $ perf report --stdio --no-children -g callee
> > >   ...
> > > 
> > >    1.37%  perf    [kernel.vmlinux]    [k] smp_call_function_single
> > >                |
> > >                ---smp_call_function_single
> > >                   _perf_event_enable
> > >                   perf_event_for_each_child
> > >                   perf_ioctl
> > >                   do_vfs_ioctl
> > >                   sys_ioctl
> > >                   entry_SYSCALL_64_fastpath
> > >                   __GI___ioctl
> > >                   0
> > >                   0
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > >                   0x1c5aa70
> > >                   0x1c5b910
> > > 		  ...
> > > 
> > > After:
> > >   $ perf report --stdio --no-children -g callee
> > >   ...
> > > 
> > >    1.37%  perf    [kernel.vmlinux]    [k] smp_call_function_single
> > >                |
> > >                ---smp_call_function_single
> > >                   _perf_event_enable
> > >                   perf_event_for_each_child
> > >                   perf_ioctl
> > >                   do_vfs_ioctl
> > >                   sys_ioctl
> > >                   entry_SYSCALL_64_fastpath
> > >                   __GI___ioctl
> > 
> > In addition to that, would it make sense to terminate the callchain with an 
> > indicator that we found something anomalous? Such an extra line:
> > 
> >                     ...
> > 
> > would not be intrusive, but would tell the informed reader that it's not a normal 
> > ending of the call chain.
> > 
> > This assumes that we can tell apart 'normal end of call chain' from 'seems to end 
> > with garbage poiner' cases - can do we that?
> 
> In case of fp unwind, I'm not sure we can determine whether it's
> normal end or not especially for optimized binaries.  It seems kernel
> also can stop callchain anytime if it sees a broken frame.
> 
> For dwarf unwind, I think it's also hard to tell since it can be
> stopped for various reasons like insufficient dump size or broken CFI,

But but. Doesn't your patch 'detect' an anomaly to begin with?

+               /*
+                * Callchain value under mmap_min_addr means it's broken
+                * or the end of callchain.  Stop.
+                */
+               if (ip < mmap_min_addr) {
+                       if (callchain_param.order == ORDER_CALLEE)
+                               break;

all I'm asking for is to indicate it in some low-key visual fashion when we 
encounter such a 'broken' call-chain.

I presume the 'old' way of ending the call-chain was that 'ip' was zero, right? We 
should not print the indicator in that case.

Also, in the dwarf case I'd also see value in indicating if any of these events 
occured:

  > For dwarf unwind, I think it's also hard to tell since it can be stopped for 
  > various reasons like insufficient dump size or broken CFI,

even if we cannot catch all anomalies. Performance analysis must stand firm on a 
hard rock of reliability and dependability, and we should always propagate 
information about possible profiling data corruption/unreliability. That's why we 
print the 'IO overload' messages during perf record for example.

Even if the problem is not caused by perf, but by external factors such as the 
compiler/linker.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/