[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8e6cc17-599c-98aa-6dc6-284a923e4fa5@gmail.com>
Date: Fri, 30 Jun 2017 16:14:46 +0900
From: Taeung Song <treeze.taeung@...il.com>
To: Namhyung Kim <namhyung@...nel.org>,
Milian Wolff <milian.wolff@...b.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Adrian Hunter <adrian.hunter@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
David Ahern <dsahern@...il.com>,
Jin Yao <yao.jin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Kim Phillips <kim.phillips@....com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Wang Nan <wangnan0@...wei.com>, kernel-team@....com
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the
new source code TUI view
Hi,
On 06/29/2017 04:11 PM, Namhyung Kim wrote:
> Hello,
>
> On Wed, Jun 28, 2017 at 11:53:22AM +0200, Milian Wolff wrote:
>> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>>> Hi,
>>>
>>> The --source-only option and new source code TUI view can show
>>> the result of performance analysis based on full source code per
>>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>>> by others some time ago..)
>>>
>>> If someone wants to see the cause, he/she will need to dig into the asm.
>>> But before that, looking at the source level can give a hint or clue
>>> for the problem.
>>>
>>> For example, if target symbol is 'hex2u64' of util/util.c,
>>> the output is like below.
>>>
>>> $ perf annotate --source-only --stdio -s hex2u64
>>> Percent | Source code of util.c for cycles:ppp (42 samples)
>>> -----------------------------------------------------------------
>>> 0.00 : 354 * While we find nice hex chars, build a long_val.
>>> 0.00 : 355 * Return number of chars processed.
>>> 0.00 : 356 */
>>> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
>>> 2.38 : 358 {
>>> 2.38 : 359 const char *p = ptr;
>>> 0.00 : 360 *long_val = 0;
>>> 0.00 : 361
>>> 30.95 : 362 while (*p) {
>>> 23.81 : 363 const int hex_val = hex(*p);
>>> 0.00 : 364
>>> 14.29 : 365 if (hex_val < 0)
>>> 0.00 : 366 break;
>>> 0.00 : 367
>>> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
>>> 0.00 : 369 p++;
>>> 0.00 : 370 }
>>> 0.00 : 371
>>> 0.00 : 372 return p - ptr;
>>> 0.00 : 373 }
>>>
>>> And I added many perf developers into Cc: because I want to listen to your
>>> opinions about this new feature, if you don't mind.
>>>
>>> If you give some feedback, I'd appreciate it! :)
>>
>> Thanks Taeung,
>>
>> I requested this feature some time ago and it's really cool to see someone
>> step up and implement it - much appreciated!
>>
>> I just tested it out on my pet-example that leverages C++ instead of C:
>>
>> ~~~~~
>> #include <complex>
>> #include <cmath>
>> #include <random>
>> #include <iostream>
>>
>> using namespace std;
>>
>> int main()
>> {
>> uniform_real_distribution<double> uniform(-1E5, 1E5);
>> default_random_engine engine;
>> double s = 0;
>> for (int i = 0; i < 10000000; ++i) {
>> s += norm(complex<double>(uniform(engine), uniform(engine)));
>> }
>> cout << s << '\n';
>> return 0;
>> }
>> ~~~~~
>>
>> Compile it with:
>>
>> g++ -O2 -g -std=c++11 test.cpp -o test
>>
>> Then record it with perf:
>>
>> perf record --call-graph dwarf ./test
>>
>> Then analyse it with `perf report`. You'll see one entry for main with
>> something like:
>>
>> + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
>>
>> Select it and annotate it, then switch to your new source-only view:
>>
>> main test.cpp
>> │ 30 > │ 31 using namespace std; > │ 32 > │ 33 int main() > │+ 34 { > │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5); > │ 36 default_random_engine engine; > │+ 37 double s = 0; > │+ 38 for (int i = 0; i < 10000000; ++i) { > 4.88 │+ 39 s += norm(complex<double>(uniform(engine), uniform(engine))); > │ 40 } > │ 41 cout << s << '\n'; > │ 42 return 0; > │+ 43 }
>>
>> Note: the line numbers are off b/c my file contains a file-header on-top.
>> Ignore that.
>>
>> Note2: There is no column header shown, so it's unclear what the first column
>> represents.
>>
>> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
>> shows 4.88... What is that?
>>
>> What this shows, is that it's extremely important to visualize inclusive cost
>> _and_ self cost in this view. Additionally, we need to account for inlining.
>> Right now, we only see the self cost that is directly within main, I suspect.
>
> Currently perf annotate doesn't use the sample period, it uses sample
> count instead and print the percentage within the function. So it's a
> different number to the perf report. I think we need to fix this
> first.
>
> Thanks,
> Namhyung
>
I understood. Hum.. so we need to replace the menu and column about
total period to things about sample count like below ?
"t Toggle total period view"
-> "t Toggle showing the number of samples for"
(I'm not sure what a short key(e.g. 't') is proper..)
Or modifying the code related to the number of samples,
show actual total period on perf-annotate ?
What do you think about this change ?
Thanks,
Taeung
>
>> For C++ this is usually very misleading, and basically makes the annotate view
>> completely useless for application-level profiling. If a second column would
>> be added with the inclusive cost with the ability to drill down, then I could
>> easily see myself using this view.
>>
>> I would appreciate if you could take this into account.
>>
>> Thanks a lot
>>
>>
>> --
>> Milian Wolff | milian.wolff@...b.com | Senior Software Engineer
>> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
>> Tel: +49-30-521325470
>> KDAB - The Qt Experts
>
>
Powered by blists - more mailing lists