[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFrcx1nqm5G0ASnuh9peMtaUE-cpAYY8xtvsRORHhXwSRod9YQ@mail.gmail.com>
Date: Tue, 19 Nov 2013 10:24:49 +0100
From: Jean Pihet <jean.pihet@...aro.org>
To: Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Cc: Ingo Molnar <mingo@...nel.org>, David Ahern <dsahern@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH] perf top: Make -g refer to callchains
Hi,
On 18 November 2013 13:59, Arnaldo Carvalho de Melo
<acme@...stprotocols.net> wrote:
> Em Fri, Nov 15, 2013 at 06:46:09AM +0100, Ingo Molnar escreveu:
>> btw., here's some 'perf top' call graph performance and profiling
>> quality feedback, with the latest perf code:
>>
>> 'perf top --call-graph fp' now works very well, using just 0.2%
>> of CPU time on a fast system:
>>
>> 4676 mingo 20 0 612m 56m 9948 S 1 0.2 0:00.68 perf
>>
>> 'perf top --call-graph dwarf' on the other hand is horrendously
>> slow, using 20% of CPU time on a 4 GHz CPU:
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 4646 mingo 20 0 658m 81m 12m R 19 0.3 0:18.17 perf
>>
>> On another system with a 2.4GHz CPU it's taking up 100% of CPU
>> time (!):
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 8018 mingo 20 0 290320 45220 8520 R 99.5 0.3 0:58.81 perf
>>
>> Profiling 'perf top' shows all sorts of very high dwarf
>> processing overhead:
>
> Yeah, top dwarf callchain has been so far a proof of concept, it
> exacerbates problems that can be seen on 'report', but since its live,
> we can see it more clearly.
Indeed. Because of the poor performance of the dwarf unwinding code,
the only practical use is to record the data (perf record) and then
later parse it (perf report). perf top does all that at once.
> The work on improving callchain processing, (rb_tree'ing, new comm
> infrastructure) alleviated the problem a bit.
>
> Tuning the stack size requested from the kernel and using --max-stack
> can help when it is really needed, but yes, work on it is *badly* needed.
The problem is that the whole user stack is dumped for every sample
while frame pointer unwinding only dumps the useful part of the
callchain.
Also an important point is the robustness of libunwind wrt async
signal, cf. http://lists.nongnu.org/archive/html/libunwind-devel/2013-09/msg00005.html.
So yes some work is °badly* needed on:
- the data size being dumped,
- the data parsing optimization,
- the choice of an implementation of dwarf unwinding (libunwind, libdw etc.),
- the compatibility with 32 bit binaries on AARCH64, which I am now
busy with in libunwind.
Jean
>
> - Arnaldo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists