[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZL7tO4pwpfX8n0gZ@kernel.org>
Date: Mon, 24 Jul 2023 18:29:31 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Tom Rix <trix@...hat.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Yang Jihong <yangjihong1@...wei.com>,
Ravi Bangoria <ravi.bangoria@....com>,
Carsten Haitzler <carsten.haitzler@....com>,
Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
James Clark <james.clark@....com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org, llvm@...ts.linux.dev, maskray@...gle.com
Subject: Re: [PATCH v1 0/4] Perf tool LTO support
Em Mon, Jul 24, 2023 at 01:12:43PM -0700, Ian Rogers escreveu:
> Add a build flag, LTO=1, so that perf is built with the -flto
> flag. Address some build errors this configuration throws up.
>
> For me on my Debian derived OS, "CC=clang CXX=clang++ LD=ld.lld" works
> fine. With GCC LTO this fails with:
> ```
> lto-wrapper: warning: using serial compilation of 50 LTRANS jobs
> lto-wrapper: note: see the ‘-flto’ option documentation for more information
> /usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel.ro+0x28): undefined reference to `memset_orig'
> /usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel.ro+0x40): undefined reference to `__memset'
> /usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel+0x28): undefined reference to `memcpy_orig'
> /usr/bin/ld: /tmp/ccK8kXAu.ltrans10.ltrans.o:(.data.rel+0x40): undefined reference to `__memcpy'
> /usr/bin/ld: /tmp/ccK8kXAu.ltrans44.ltrans.o: in function `test__arch_unwind_sample':
> /home/irogers/kernel.org/tools/perf/arch/x86/tests/dwarf-unwind.c:72: undefined reference to `perf_regs_load'
> collect2: error: ld returned 1 exit status
> ```
>
> The issue is that we build multiple .o files in a directory and then
> link them into a .o with "ld -r" (cmd_ld_multi). This early link step
> appears to trigger GCC to remove the .S file definition of the symbol
> and break the later link step (the perf-in.o shows perf_regs_load, for
> example, going from the text section to being undefined at the link
> step which doesn't happen with clang or without LTO). It is possible
> to work around this by taking the final perf link command and adding
> the .o files generated from .S back into it, namely:
> arch/x86/tests/regs_load.o
> bench/mem-memset-x86-64-asm.o
> bench/mem-memcpy-x86-64-asm.o
>
> A quick performance check and the performance improvements from LTO
> are noticeable:
>
> Non-LTO
> ```
> $ perf bench internals synthesize
> # Running 'internals/synthesize' benchmark:
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
> Average synthesis took: 202.216 usec (+- 0.160 usec)
> Average num. events: 51.000 (+- 0.000)
> Average time per event 3.965 usec
> Average data synthesis took: 230.875 usec (+- 0.285 usec)
> Average num. events: 271.000 (+- 0.000)
> Average time per event 0.852 usec
> ```
>
> LTO
> ```
> $ perf bench internals synthesize
> # Running 'internals/synthesize' benchmark:
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
> Average synthesis took: 104.530 usec (+- 0.074 usec)
> Average num. events: 51.000 (+- 0.000)
> Average time per event 2.050 usec
> Average data synthesis took: 112.660 usec (+- 0.114 usec)
> Average num. events: 273.000 (+- 0.000)
> Average time per event 0.413 usec
Cool stuff! Applied locally, test building now on the container suite.
- Arnaldo
> ```
>
> Ian Rogers (4):
> perf stat: Avoid uninitialized use of perf_stat_config
> perf parse-events: Avoid use uninitialized warning
> perf test: Avoid weak symbol for arch_tests
> perf build: Add LTO build option
>
> tools/perf/Makefile.config | 5 +++++
> tools/perf/tests/builtin-test.c | 11 ++++++++++-
> tools/perf/tests/stat.c | 2 +-
> tools/perf/util/parse-events.c | 2 +-
> tools/perf/util/stat.c | 2 +-
> 5 files changed, 18 insertions(+), 4 deletions(-)
>
> --
> 2.41.0.487.g6d72f3e995-goog
>
--
- Arnaldo
Powered by blists - more mailing lists