[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQCU67EsHyw_FsqGbRuityahZTSAtWzffU=hLUJ7K=aZ=1hhA@mail.gmail.com>
Date: Thu, 13 Feb 2025 14:10:34 +0100
From: Krzysztof Łopatowski <krzysztof.m.lopatowski@...il.com>
To: Adrian Hunter <adrian.hunter@...el.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Ian Rogers <irogers@...gle.com>
Cc: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: perf: Question about machine__create_extra_kernel_maps and trampoline symbols
Hi,
I'm investigating performance issues with perf's kallsyms parsing. Running
`perf record -g perf trace -a --max-events 1` on an x86_64 Ubuntu 24.10 on a VM
(perf version 6.11) showed that about 61% of time was spent in
'kallsyms__parse'.
Total execution time was 370 ms. When running latest version from
tmp.perf-tools-next
It's 530ms total and 38% in 'kallsyms__parse' because the old version
doesn't have
bpf skeletons enabled.
During regular execution this function is called three times:
1. In machine__get_running_kernel_start - searching for _text
2. In machine__get_running_kernel_start - searching for _edata
3. In machine__create_extra_kernel_maps - which is the focus of my question
Regarding the third call (implemented in tools/perf/arch/x86/util/machine.c),
I notice it searches for:
- _entry_trampoline
- __entry_SYSCALL_64_trampoline
I'm puzzled by the dynamic allocation in add_extra_kernel_map, which seems to
expect multiple __entry_SYSCALL_64_trampoline symbols. This functionality was
introduced in:
https://lore.kernel.org/all/1526986485-6562-1-git-send-email-adrian.hunter@intel.com/
I've attempted to trigger the trampoline logic in two ways:
1. Using the example provided (uname_x_n.c), which only recorded these symbols:
- entry_SYSCALL_64_after_hwframe
- entry_SYSCALL_64
- entry_SYSCALL_64_safe_stack
2. Setting kprobes and kretprobes to try to make the kernel create these special
trampoline symbols, but this approach also didn't work.
Questions for the perf developer community:
1. Is there a reliable way to trigger this trampoline logic in perf? I'd like to
create a perf test for this functionality.
2. If machine__create_extra_kernel_maps is obsolete (since it's
x86_64-specific),
could we remove it to reduce /proc/kallsyms parsing time by at least 50%?
I'm working on a patch to simplify machine__create_kernel_maps to call
kallsyms__parse only once. However, I would appreciate guidance from those more
familiar with perf.
Side note: Could exposing the kernel's lookup_symbol_name function
(from kernel/kallsyms.c) to userspace eliminate the need for reading
/proc/kallsyms?
Best regards,
Krzysztof Łopatowski
Powered by blists - more mailing lists