lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230724201247.748146-1-irogers@google.com>
Date:   Mon, 24 Jul 2023 13:12:43 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Tom Rix <trix@...hat.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Yang Jihong <yangjihong1@...wei.com>,
        Ravi Bangoria <ravi.bangoria@....com>,
        Carsten Haitzler <carsten.haitzler@....com>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        James Clark <james.clark@....com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        bpf@...r.kernel.org, llvm@...ts.linux.dev
Cc:     maskray@...gle.com
Subject: [PATCH v1 0/4] Perf tool LTO support

Add a build flag, LTO=1, so that perf is built with the -flto
flag. Address some build errors this configuration throws up.

For me on my Debian derived OS, "CC=clang CXX=clang++ LD=ld.lld" works
fine. With GCC LTO this fails with:
```
lto-wrapper: warning: using serial compilation of 50 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more information
/usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel.ro+0x28): undefined reference to `memset_orig'
/usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel.ro+0x40): undefined reference to `__memset'
/usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel+0x28): undefined reference to `memcpy_orig'
/usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel+0x40): undefined reference to `__memcpy'
/usr/bin/ld: /tmp/ccK8kXAu.ltrans44.ltrans.o: in function `test__arch_unwind_sample':
/home/irogers/kernel.org/tools/perf/arch/x86/tests/dwarf-unwind.c:72: undefined reference to `perf_regs_load'
collect2: error: ld returned 1 exit status
```

The issue is that we build multiple .o files in a directory and then
link them into a .o with "ld -r" (cmd_ld_multi). This early link step
appears to trigger GCC to remove the .S file definition of the symbol
and break the later link step (the perf-in.o shows perf_regs_load, for
example, going from the text section to being undefined at the link
step which doesn't happen with clang or without LTO). It is possible
to work around this by taking the final perf link command and adding
the .o files generated from .S back into it, namely:
arch/x86/tests/regs_load.o
bench/mem-memset-x86-64-asm.o
bench/mem-memcpy-x86-64-asm.o

A quick performance check and the performance improvements from LTO
are noticeable:

Non-LTO
```
$ perf bench internals synthesize
 # Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 202.216 usec (+- 0.160 usec)
  Average num. events: 51.000 (+- 0.000)
  Average time per event 3.965 usec
  Average data synthesis took: 230.875 usec (+- 0.285 usec)
  Average num. events: 271.000 (+- 0.000)
  Average time per event 0.852 usec
```

LTO
```
$ perf bench internals synthesize
 # Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 104.530 usec (+- 0.074 usec)
  Average num. events: 51.000 (+- 0.000)
  Average time per event 2.050 usec
  Average data synthesis took: 112.660 usec (+- 0.114 usec)
  Average num. events: 273.000 (+- 0.000)
  Average time per event 0.413 usec
```

Ian Rogers (4):
  perf stat: Avoid uninitialized use of perf_stat_config
  perf parse-events: Avoid use uninitialized warning
  perf test: Avoid weak symbol for arch_tests
  perf build: Add LTO build option

 tools/perf/Makefile.config      |  5 +++++
 tools/perf/tests/builtin-test.c | 11 ++++++++++-
 tools/perf/tests/stat.c         |  2 +-
 tools/perf/util/parse-events.c  |  2 +-
 tools/perf/util/stat.c          |  2 +-
 5 files changed, 18 insertions(+), 4 deletions(-)

-- 
2.41.0.487.g6d72f3e995-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ