lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Jun 2017 11:53:22 +0200
From:   Milian Wolff <milian.wolff@...b.com>
To:     Taeung Song <treeze.taeung@...il.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        linux-kernel@...r.kernel.org,
        Adrian Hunter <adrian.hunter@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        David Ahern <dsahern@...il.com>,
        Jin Yao <yao.jin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Kim Phillips <kim.phillips@....com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Wang Nan <wangnan0@...wei.com>
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> Hi,
> 
> The --source-only option and new source code TUI view can show
> the result of performance analysis based on full source code per
> symbol(function). (Namhyung Kim told me this idea and it was also requested
> by others some time ago..)
> 
> If someone wants to see the cause, he/she will need to dig into the asm.
> But before that, looking at the source level can give a hint or clue
> for the problem.
> 
> For example, if target symbol is 'hex2u64' of util/util.c,
> the output is like below.
> 
>     $ perf annotate --source-only --stdio -s hex2u64
>  Percent |      Source code of util.c for cycles:ppp (42 samples)
> -----------------------------------------------------------------
>     0.00 : 354   * While we find nice hex chars, build a long_val.
>     0.00 : 355   * Return number of chars processed.
>     0.00 : 356   */
>     0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
>     2.38 : 358  {
>     2.38 : 359          const char *p = ptr;
>     0.00 : 360          *long_val = 0;
>     0.00 : 361
>    30.95 : 362          while (*p) {
>    23.81 : 363                  const int hex_val = hex(*p);
>     0.00 : 364
>    14.29 : 365                  if (hex_val < 0)
>     0.00 : 366                          break;
>     0.00 : 367
>    26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
>     0.00 : 369                  p++;
>     0.00 : 370          }
>     0.00 : 371
>     0.00 : 372          return p - ptr;
>     0.00 : 373  }
> 
> And I added many perf developers into Cc: because I want to listen to your
> opinions about this new feature, if you don't mind.
> 
> If you give some feedback, I'd appreciate it! :)

Thanks Taeung,

I requested this feature some time ago and it's really cool to see someone 
step up and implement it - much appreciated!

I just tested it out on my pet-example that leverages C++ instead of C:

~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
~~~~~

Compile it with:

g++ -O2 -g -std=c++11 test.cpp -o test

Then record it with perf:

perf record --call-graph dwarf ./test

Then analyse it with `perf report`. You'll see one entry for main with 
something like:

+  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main

Select it and annotate it, then switch to your new source-only view:

main  test.cpp
       │  30                                                                                                                                                                                                    
       │  31    using namespace std;                                                                                                                                                                            
       │  32                                                                                                                                                                                                    
       │  33    int main()                                                                                                                                                                                      
       │+ 34    {                                                                                                                                                                                               
       │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);                                                                                                                                       
       │  36        default_random_engine engine;                                                                                                                                                               
       │+ 37        double s = 0;                                                                                                                                                                               
       │+ 38        for (int i = 0; i < 10000000; ++i) {                                                                                                                                                        
  4.88 │+ 39            s += norm(complex<double>(uniform(engine), 
uniform(engine)));                                                                                                                           
       │  40        }                                                                                                                                                                                           
       │  41        cout << s << '\n';                                                                                                                                                                          
       │  42        return 0;                                                                                                                                                                                   
       │+ 43    } 

Note: the line numbers are off b/c my file contains a file-header on-top. 
Ignore that.

Note2: There is no column header shown, so it's unclear what the first column 
represents.

Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate 
shows 4.88... What is that?

What this shows, is that it's extremely important to visualize inclusive cost 
_and_ self cost in this view. Additionally, we need to account for inlining. 
Right now, we only see the self cost that is directly within main, I suspect. 
For C++ this is usually very misleading, and basically makes the annotate view 
completely useless for application-level profiling. If a second column would 
be added with the inclusive cost with the ability to drill down, then I could 
easily see myself using this view.

I would appreciate if you could take this into account.

Thanks a lot


-- 
Milian Wolff | milian.wolff@...b.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
Download attachment "smime.p7s" of type "application/pkcs7-signature" (3826 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ