lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221215192817.2734573-1-namhyung@kernel.org>
Date:   Thu, 15 Dec 2022 11:28:08 -0800
From:   Namhyung Kim <namhyung@...nel.org>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
        Milian Wolff <milian.wolff@...b.com>,
        Leo Yan <leo.yan@...aro.org>
Subject: [PATCHSET 0/9] perf report: Improve srcline sort performance (v1)

Hello,

I noticed a performance problem in the srcline/srcfile processing during
perf report when it's using an external addr2line process.  I guess it's
also helpful even if it uses the libbfd to get the srcline info.

Also note that it's mostly from large (static) binaries, but smaller
binaries should also benefit from the fix if they have a lot of samples.

The first 5 patches are general fixes and updates.  The latter 4 patches
implemented the actual speed-up.

Let's test it with the perf tools itself.  Build a static binary like below.

  $ cd tools/perf
  $ make NO_JVMTI=1 LDFLAGS=-static

Then run the perf test workload.

  $ ./perf record -- ./perf test -w noploop

And run the perf report with srcline sort key like this.

  $ ./perf report -n -s srcline --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4K of event 'cycles:u'
  # Event count (approx.): 3572938596
  #
  # Overhead       Samples  Source:Line
  # ........  ............  ............
  #
      99.94%          4010  noploop.c:26
       0.03%            14  ??:0
       0.03%             1  perf.c:330
       0.00%             1  wcscpy.o:0

The problem is that it runs the addr2line when it processes each sample.
But as you can see many samples can have same result.  IOW, if the samples
have same address, we don't need to run the addr2line each time.

So I changed the sort_key->cmp() to compare the addresses only and moved
the addr2line from sort_key->collapse() so that they can be run after
merging the samples with the same address.

With the change, I can get a huge speed-up in processing srcline info
while they generate the same output.

Before:

  $ ./perf stat -- ./perf report -s srcline > /dev/null

   Performance counter stats for './perf report -s srcline':

           15,397.13 msec task-clock:u                     #    0.993 CPUs utilized
                   0      context-switches:u               #    0.000 /sec
                   0      cpu-migrations:u                 #    0.000 /sec
               3,810      page-faults:u                    #  247.449 /sec
      54,516,351,820      cycles:u                         #    3.541 GHz
      31,494,118,293      instructions:u                   #    0.58  insn per cycle
       8,577,271,187      branches:u                       #  557.069 M/sec
       1,216,165,520      branch-misses:u                  #   14.18% of all branches

        15.505066606 seconds time elapsed

        15.094122000 seconds user
         0.396962000 seconds sys

After:

  $ ./perf stat -- ./perf report -s srcline > /dev/null

   Performance counter stats for './perf report -s srcline':

              105.66 msec task-clock:u                     #    0.994 CPUs utilized
                   0      context-switches:u               #    0.000 /sec
                   0      cpu-migrations:u                 #    0.000 /sec
               3,275      page-faults:u                    #   30.995 K/sec
         185,063,407      cycles:u                         #    1.751 GHz
         142,470,215      instructions:u                   #    0.77  insn per cycle
          34,584,038      branches:u                       #  327.311 M/sec
           3,226,005      branch-misses:u                  #    9.33% of all branches

         0.106270464 seconds time elapsed

         0.074254000 seconds user
         0.032871000 seconds sys

The code is available at 'perf/srcline-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (9):
  perf srcline: Do not return NULL for srcline
  perf report: Ignore SIGPIPE for srcline
  perf symbol: Add filename__has_section()
  perf srcline: Skip srcline if .debug_line is missing
  perf srcline: Conditionally suppress addr2line warnings
  perf hist: Add perf_hpp_fmt->init() callback
  perf hist: Improve srcline sort key performance
  perf hist: Improve srcfile sort key performance
  perf hist: Improve srcline_{from,to} sort key performance

 tools/perf/builtin-report.c      |   1 +
 tools/perf/util/hist.c           |  10 +--
 tools/perf/util/hist.h           |   1 +
 tools/perf/util/sort.c           | 129 ++++++++++++++++++++++++++++---
 tools/perf/util/sort.h           |   1 +
 tools/perf/util/srcline.c        |  20 +++--
 tools/perf/util/symbol-elf.c     |  28 +++++++
 tools/perf/util/symbol-minimal.c |   5 ++
 tools/perf/util/symbol.h         |   1 +
 9 files changed, 176 insertions(+), 20 deletions(-)


base-commit: 818448e9cf92e5c6b3c10320372eefcbe4174e4f
-- 
2.39.0.314.g84b9a713c41-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ