[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170629071101.GA10884@sejong>
Date: Thu, 29 Jun 2017 16:11:01 +0900
From: Namhyung Kim <namhyung@...nel.org>
To: Milian Wolff <milian.wolff@...b.com>
Cc: Taeung Song <treeze.taeung@...il.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel@...r.kernel.org,
Adrian Hunter <adrian.hunter@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
David Ahern <dsahern@...il.com>,
Jin Yao <yao.jin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Kim Phillips <kim.phillips@....com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Wang Nan <wangnan0@...wei.com>, kernel-team@....com
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the
new source code TUI view
Hello,
On Wed, Jun 28, 2017 at 11:53:22AM +0200, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> > Hi,
> >
> > The --source-only option and new source code TUI view can show
> > the result of performance analysis based on full source code per
> > symbol(function). (Namhyung Kim told me this idea and it was also requested
> > by others some time ago..)
> >
> > If someone wants to see the cause, he/she will need to dig into the asm.
> > But before that, looking at the source level can give a hint or clue
> > for the problem.
> >
> > For example, if target symbol is 'hex2u64' of util/util.c,
> > the output is like below.
> >
> > $ perf annotate --source-only --stdio -s hex2u64
> > Percent | Source code of util.c for cycles:ppp (42 samples)
> > -----------------------------------------------------------------
> > 0.00 : 354 * While we find nice hex chars, build a long_val.
> > 0.00 : 355 * Return number of chars processed.
> > 0.00 : 356 */
> > 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
> > 2.38 : 358 {
> > 2.38 : 359 const char *p = ptr;
> > 0.00 : 360 *long_val = 0;
> > 0.00 : 361
> > 30.95 : 362 while (*p) {
> > 23.81 : 363 const int hex_val = hex(*p);
> > 0.00 : 364
> > 14.29 : 365 if (hex_val < 0)
> > 0.00 : 366 break;
> > 0.00 : 367
> > 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
> > 0.00 : 369 p++;
> > 0.00 : 370 }
> > 0.00 : 371
> > 0.00 : 372 return p - ptr;
> > 0.00 : 373 }
> >
> > And I added many perf developers into Cc: because I want to listen to your
> > opinions about this new feature, if you don't mind.
> >
> > If you give some feedback, I'd appreciate it! :)
>
> Thanks Taeung,
>
> I requested this feature some time ago and it's really cool to see someone
> step up and implement it - much appreciated!
>
> I just tested it out on my pet-example that leverages C++ instead of C:
>
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 10000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ~~~~~
>
> Compile it with:
>
> g++ -O2 -g -std=c++11 test.cpp -o test
>
> Then record it with perf:
>
> perf record --call-graph dwarf ./test
>
> Then analyse it with `perf report`. You'll see one entry for main with
> something like:
>
> + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
>
> Select it and annotate it, then switch to your new source-only view:
>
> main test.cpp
> │ 30 > │ 31 using namespace std; > │ 32 > │ 33 int main() > │+ 34 { > │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5); > │ 36 default_random_engine engine; > │+ 37 double s = 0; > │+ 38 for (int i = 0; i < 10000000; ++i) { > 4.88 │+ 39 s += norm(complex<double>(uniform(engine), uniform(engine))); > │ 40 } > │ 41 cout << s << '\n'; > │ 42 return 0; > │+ 43 }
>
> Note: the line numbers are off b/c my file contains a file-header on-top.
> Ignore that.
>
> Note2: There is no column header shown, so it's unclear what the first column
> represents.
>
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
> shows 4.88... What is that?
>
> What this shows, is that it's extremely important to visualize inclusive cost
> _and_ self cost in this view. Additionally, we need to account for inlining.
> Right now, we only see the self cost that is directly within main, I suspect.
Currently perf annotate doesn't use the sample period, it uses sample
count instead and print the percentage within the function. So it's a
different number to the perf report. I think we need to fix this
first.
Thanks,
Namhyung
> For C++ this is usually very misleading, and basically makes the annotate view
> completely useless for application-level profiling. If a second column would
> be added with the inclusive cost with the ability to drill down, then I could
> easily see myself using this view.
>
> I would appreciate if you could take this into account.
>
> Thanks a lot
>
>
> --
> Milian Wolff | milian.wolff@...b.com | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
Powered by blists - more mailing lists