[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <564C283E.6070306@huawei.com>
Date: Wed, 18 Nov 2015 15:26:54 +0800
From: "Wangnan (F)" <wangnan0@...wei.com>
To: Namhyung Kim <namhyung@...nel.org>
CC: Jiri Olsa <jolsa@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
lkml <linux-kernel@...r.kernel.org>,
David Ahern <dsahern@...il.com>,
"Peter Zijlstra" <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Milian Wolff <milian.wolff@...b.com>
Subject: Re: [PATCH 2/3] perf tools: Add callchain order support for libunwind
DWARF unwinder
On 2015/11/18 13:41, Namhyung Kim wrote:
> On Wed, Nov 18, 2015 at 12:13:08PM +0800, Wangnan (F) wrote:
>>
>> On 2015/11/17 23:05, Jiri Olsa wrote:
>>> From: Jiri Olsa <jolsa@...hat.com>
>>>
>>> As reported by Milian, currently for DWARF unwind (both libdw
>>> and libunwind) we display callchain in callee order only.
>>>
>>> Adding the support to follow callchain order setup to libunwind
>>> DWARF unwinder, so we could get following output for report:
>>>
>>> $ perf record --call-graph dwarf ls
>>> ...
>>> $ perf report --no-children --stdio
>>>
>>> 39.26% ls libc-2.21.so [.] __strcoll_l
>>> |
>>> ---__strcoll_l
>>> mpsort_with_tmp
>>> mpsort_with_tmp
>>> sort_files
>>> main
>>> __libc_start_main
>>> _start
>>> 0
>>>
>>> $ perf report -g caller --no-children --stdio
>>> ...
>>> 39.26% ls libc-2.21.so [.] __strcoll_l
>>> |
>>> ---0
>>> _start
>>> __libc_start_main
>>> main
>>> sort_files
>>> mpsort_with_tmp
>>> mpsort_with_tmp
>>> __strcoll_l
>>>
>>> Reported-by: Milian Wolff <milian.wolff@...b.com>
>>> Based-on-patch-by: Milian Wolff <milian.wolff@...b.com>
>>> Link: http://lkml.kernel.org/n/tip-lmtbeqm403f3luw4jkjevsi5@git.kernel.org
>>> Signed-off-by: Jiri Olsa <jolsa@...nel.org>
>>> ---
>>> tools/perf/util/unwind-libunwind.c | 47 ++++++++++++++++++++++++--------------
>>> 1 file changed, 30 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
>>> index 0ae8844fe7a6..705e1c19f1ea 100644
>>> --- a/tools/perf/util/unwind-libunwind.c
>>> +++ b/tools/perf/util/unwind-libunwind.c
>> [SNIP]
>>
>>> - unw_get_reg(&c, UNW_REG_IP, &ip);
>>> - ret = ip ? entry(ip, ui->thread, cb, arg) : 0;
>> In original code if ip == 0 entry() won't be called.
>>
>>> + if (callchain_param.order == ORDER_CALLER)
>>> + j = max_stack - i - 1;
>>> + ret = entry(ips[j], ui->thread, cb, arg);
>> But in new code event if ips[j] == 0 an entry will be built, which causes
>> a behavior changes user noticable:
>>
>> Before this patch:
>>
>>
>> # perf report --no-children --stdio --call-graph=callee
>> ...
>> 3.38% a.out a.out [.] funcc
>> |
>> ---funcc
>> |
>> --2.70%-- funcb
>> funca
>> main
>> __libc_start_main
>> _start
>>
>> After this patch:
>>
>> # perf report --no-children --stdio --call-graph=callee
>> ...
>> 3.38% a.out a.out [.] funcc
>> |
>> ---funcc
>> |
>> |--2.70%-- funcb
>> | funca
>> | main
>> | __libc_start_main
>> | _start
>> |
>> --0.68%-- 0
>>
>>
>> I'm not sure whether we can regard this behavior changing as a bugfix? I
>> think
>> there may be some reason the original code explicitly avoid creating an '0'
>> entry.
> I think callchain value being 0 is an error or marker for the end of
> callchain. So it'd be better avoiding 0 entry.
>
> But unfortunately, we have many 0 entries (and broken callchain after
> them) with fp recording on optimized binaries. I think we should omit
> those callchains.
>
> Maybe something like this?
>
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 5ef90be2a249..22642c5719ab 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1850,6 +1850,15 @@ static int thread__resolve_callchain_sample(struct thread *thread,
> #endif
> ip = chain->ips[j];
>
> + /* callchain value inside zero page means it's broken, stop */
> + if (ip < 4096) {
> + if (callchain_param.order == ORDER_CALLER) {
> + callchain_cursor_reset(&callchain_cursor);
> + continue;
> + } else
> + break;
> + }
> +
> err = add_callchain_ip(thread, parent, root_al, &cpumode, ip);
>
> if (err)
Then we totally get rid of 0 entries, but how can we explain
the sum of overhead of different branches?
Is it possible to explicitly tell user the place where perf
failed to unwind call stack? For example:
3.38% a.out a.out [.] funcc
|
---funcc
|
|--2.70%-- funcb
| funca
| main
| __libc_start_main
| _start
|
--0.68%-- (unwind failure)
Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists