[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221109134132.9052-1-nick.alcock@oracle.com>
Date: Wed, 9 Nov 2022 13:41:24 +0000
From: Nick Alcock <nick.alcock@...cle.com>
To: mcgrof@...nel.org, masahiroy@...nel.org
Cc: linux-modules@...r.kernel.org, linux-kernel@...r.kernel.org,
arnd@...db.de, akpm@...ux-foundation.org, eugene.loh@...cle.com,
kris.van.hees@...cle.com
Subject: [PATCH PING v9] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms
The kallmodsyms patch series was originally posted in Nov 2019, and the thread
(https://lore.kernel.org/linux-kbuild/20191114223036.9359-1-eugene.loh@oracle.com/t/#u)
shows review comments, questions, and feedback from interested parties.
All review comments have been satisfied, as far as I know: in particular
Yamada's note about translation units that are shared between built-in modules
is satisfied with a better representation which is also much, much smaller.
A kernel tree containing this series alone:
https://github.com/oracle/dtrace-linux-kernel kallmodsyms/6.1-rc4
The whole point of symbols is that their names are unique: you can look up a
symbol and get back a unique address, and vice versa. Alas, because
/proc/kallsyms (rightly) reports all symbols, even hidden ones, it does not
really satisfy this requirement. Large numbers of symbols are duplicated
many times (just search for __list_del_entry!), and while usually these are
just out-of-lined things defined in header files and thus all have the same
implementation, it does make it needlessly hard to figure out which one is
which in stack dumps, when tracing, and such things. Right now the kernel
has no way at all to tell these apart, and nor has the user: their address
differs and that's all. Which module did they come from? Which object
file? We don't know. Figuring out which is which when tracing needs a
combination of guesswork and luck. In discussions at LPC it became clear
that this is not just annoying me but Steve Rostedt and others, so it's
probably desirable to fix this.
It turns out that the linker, and the kernel build system, can be made to
give us everything we need to resolve this once and for all. This series
provides a new /proc/kallmodsyms which is like /proc/kallsyms except that it
annotates every (textual) symbol which comes from a built-in kernel module
with the module's name, in square brackets: if a symbol is used by multiple
modules, it gets [multiple] [names]. (We also add corresponding new fields
in the kallsyms iterator.)
But that's not quite enough: some symbols are still ambiguous, particularly
those that appear in the non-modular parts of the core kernel but also some
things that appear in built-in modules. We annotate such symbols with
cut-down {object file} names: the combination of symbol, [module] [names]
and {object file name} is unique. (The object file names are cut down to
save space: we store only the shortest suffix needed to distinguish symbols
from each other. It's fairly rare even to see two/level names, let alone
three/level/ones. We also save even more space by annotating every symbol
in a given object file with the object file name if we annotate any of
them.)
In brief we do this by mapping from address ranges to object files (with
assistance from the linker map file), then mapping from those object files
to built-in kernel modules and object file names. Because the number of
object files is much smaller than the number of symbols, because we fuse
address range and object file entries together if possible, and becasue we
don't even store object file names unless we need to, this is a fairly
efficient representation, even with a bit of extra complexity to allow
object files to be in more than one module at once.
The size impact of all of this is minimal: in testing, vmlinux grew by 16632
bytes, and the compressed vmlinux only grew by 12544 bytes (about .1% of a
10MiB kernel): though this is very configuration-dependent, it seems likely
to scale roughly with the kernel as a whole.
This is all controlled by a new config parameter CONFIG_KALLMODSYMS, which when
set results in output in /proc/kallmodsyms that looks like this:
ffffffff97606e50 t not_visible
ffffffff97606e70 T perf_msr_probe
ffffffff97606f80 t test_msr [rapl]
ffffffffa6007350 t rapl_pmu_event_stop [rapl]
ffffffffa6007440 t rapl_pmu_event_del [rapl]
ffffffffa6007460 t rapl_hrtimer_handle [rapl]
ffffffffa6007500 t rapl_pmu_event_read [rapl]
ffffffffa6007520 t rapl_pmu_event_init [rapl]
ffffffffa6007630 t rapl_cpu_offline [rapl]
ffffffffa6007710 t amd_pmu_event_map {core.o}
ffffffffa6007750 t amd_pmu_add_event {core.o}
ffffffffa6007760 t amd_put_event_constraints_f17h {core.o}
The modular symbols are notated as [rapl] even if rapl is built into the
kernel. Further, at least one symbol nottated as {core.o} would have been
ambiguous without that notation. If we look a little further down, we see:
ffffffff97607a70 t cmask_show {core.o}
ffffffff97607ab0 t inv_show {core.o}
ffffffff97607ae0 t edge_show {core.o}
ffffffff97607b10 t umask_show {core.o}
ffffffff97607b40 t event_show {core.o}
where event_show in particular is highly ambiguous and appears in many
object files, all of which are now notated with different {object file
names}.
Further down, we see what happens when object files are reused by multiple
modules, all of which are built in to the kernel, and some of which contain
symbols that are ambiguously-named even within that set of modules:
ffffffff97d7aed0 t liquidio_pcie_mmio_enabled [liquidio]
ffffffff97d7aef0 t liquidio_pcie_resume [liquidio]
ffffffff97d7af00 t liquidio_ptp_adjtime [liquidio]
ffffffff97d7af50 t liquidio_ptp_enable [liquidio]
ffffffff97d7af70 t liquidio_get_stats64 [liquidio]
ffffffff97d7b0f0 t liquidio_fix_features [liquidio]
ffffffff97d7b1c0 t liquidio_get_port_parent_id [liquidio]
[...]
ffffffff97d824c0 t lio_vf_rep_modinit [liquidio]
ffffffff97d824f0 t lio_vf_rep_modexit [liquidio]
ffffffff97d82520 t lio_ethtool_get_channels [liquidio] [liquidio_vf]
ffffffff97d82600 t lio_ethtool_get_ringparam [liquidio] [liquidio_vf]
ffffffff97d826a0 t lio_get_msglevel [liquidio] [liquidio_vf]
ffffffff97d826c0 t lio_vf_set_msglevel [liquidio] [liquidio_vf]
ffffffff97d826e0 t lio_get_pauseparam [liquidio] [liquidio_vf]
ffffffff97d82710 t lio_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffff97d82e70 t lio_vf_get_ethtool_stats [liquidio] [liquidio_vf]
[...]
ffffffff97d91a80 t cn23xx_vf_mbox_thread [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d91aa0 t cpumask_weight.constprop.0 [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d91ac0 t cn23xx_vf_msix_interrupt_handler [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d91bd0 t cn23xx_vf_get_oq_ticks [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d91c00 t cn23xx_vf_ask_pf_to_do_flr [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d91c70 t cn23xx_octeon_pfvf_handshake [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d91e20 t cn23xx_setup_octeon_vf_device [liquidio] [liquidio_vf] {cn23xx_vf_device.o}
ffffffff97d92060 t octeon_mbox_read [liquidio] [liquidio_vf]
ffffffff97d92230 t octeon_mbox_write [liquidio] [liquidio_vf]
[...]
ffffffff97d946b0 t octeon_alloc_soft_command_resp [liquidio] [liquidio_vf]
ffffffff97d947e0 t octnet_send_nic_data_pkt [liquidio] [liquidio_vf]
ffffffff97d94820 t octnet_send_nic_ctrl_pkt [liquidio] [liquidio_vf]
ffffffff97d94ab0 t liquidio_get_stats64 [liquidio_vf]
ffffffff97d94c10 t liquidio_fix_features [liquidio_vf]
ffffffff97d94cd0 t wait_for_pending_requests [liquidio_vf]
Like /proc/kallsyms, the output is sorted by address, so keeps the curious
property of /proc/kallsyms that symbols may appear repeatedly with different
addresses: but now, unlike in /proc/kallsyms, we can see that those symbols
appear repeatedly because they are *different symbols* that ultimately
belong to different modules or different object files, all of which are
built in to the kernel.
Note that kernel symbols for built-in modules will probably appear
interspersed with other symbols that are part of different modules and
non-modular always-built-in symbols, which, as usual, have no
square-bracketed module denotation (though they might have an {object file
name}.
As with /proc/kallsyms, non-root usage produces addresses that are all zero.
(Now that kallmodsyms data uses very little space, the new
CONFIG_KALLMODSYMS option might perhaps be something people don't want to
bother with: maybe we can just control it via CONFIG_KALLSYMS or something?)
Limitations:
- this approach only works for textual symbols (and weak ones). I don't
see any way to make it work for data symbols etc: except for initialized
data they don't really have corresponding object files at all and they
tend to get merged together anyway.
- Non-built-in modules can also have ambiguous symbols in them in different
input object files: they aren't handled yet because kallsyms never runs
over modules to create the necessary sections. This is fixable, but it's
probably best handled in another patch series. (kallsyms would need to
do much less work for modules: only the sections introduced by this patch
series would need emission at all, and no [module] notations would be
needed, only {objfile}.)
- Section start/end symbols necessarily lie on the boundary between object
files, so are sometimes misreported as being in the wrong object file or
module. This is unlikely to be too troublesome for these symbols in
particular, but if anyone can figure out a way to fix this I'd be happy
to do it.
- There is no BPF iterator support yet (it's just a matter of adding it
if needed).
The commits in this series all have reviewed-by tags: they're all from
internal reviews, so please ignore them.
Differences from v8, February 2022:
- Add object file name handling, emitting only those object names needed to
disambiguate symbols, shortening them as much as possible compatible with
that.
- Rename .kallsyms_module_names to .kallsyms_mod_objnames now that it
contains object file names too.
- Fix a bug in optimize_obj2mod that prevented proper reuse of module names
for object files appearing in both multimodule modules and single-module
modules: saves a few KiB more, often more than the space increase due to
object file name handling.
- Rebased atop v6.1-rc2: move modules_thick.builtin generation into
the top-level Kbuild accordingly, and adjust to getopt_long use in
scripts/kallsyms.
- Significant revisions to the cover letter.
- Add proof-of-concept kallmodsyms module support to perf.
- (This ping) confirmed that series applies atop v6.1-rc4 without
further changes.
Differences from v7, December 2021:
- Adjust for changes in the v5.17 merge window. Adjust a few commit
messages and shrink the cover letter.
- Drop the symbol-size patch, probably better done from userspace.
Differences from v6, November 2021:
- Adjust for rewrite of confdata machinery in v5.16 (tristate.conf
handling is now more of a rewrite than a reversion)
Differences from v5, October 2021:
- Fix generation of mapfiles under UML
Differences from v4, September 2021:
- Fix building of tristate.conf if missing (usually concealed by the
syncconfig being run for other reasons, but not always: the kernel
test robot spotted it).
- Forward-port atop v5.15-rc3.
Differences from v3, August 2021:
- Fix a kernel test robot warning in get_ksymbol_core (possible
use of uninitialized variable if kallmodsyms was wanted but
kallsyms_module_offsets was not present, which is most unlikely).
Differences from v2, June 2021:
- Split the series up. In particular, the size impact of the table
optimizer is now quantified, and the symbol-size patch is split out and
turned into an RFC patch, with the /proc/kallmodsyms format before that
patch lacking a size column. Some speculation on how to make the symbol
sizes less space-wasteful is added (but not yet implemented).
- Drop a couple of unnecessary #includes, one unnecessarily exported
symbol, and a needless de-staticing.
Differences from v1, in 2019:
- Move from a straight symbol->module name mapping to a mapping from
address-range to TU to module name list, bringing major space savings
over the previous approach and support for object files used by many
built-in modules at the same time, at the cost of a slightly more complex
approach (unavoidably so, I think, given that we have to merge three data
sources together: the link map in .tmp_vmlinux.ranges, the nm output on
stdin, and the mapping from TU name to module names in
modules_thick.builtin).
We do opportunistic merging of TUs if they cite the same modules and
reuse module names where doing so is simple: see optimize_obj2mod below.
I considered more extensive searches for mergeable entries and more
intricate encodings of the module name list allowing TUs that are used by
overlapping sets of modules to share their names, but such modules are
rare enough (and such overlapping sharings are vanishingly rare) that it
seemed likely to save only a few bytes at the cost of much more
hard-to-test code. This is doubly true now that the tables needed are
only a few kilobytes in length.
Signed-off-by: Nick Alcock <nick.alcock@...cle.com>
Signed-off-by: Eugene Loh <eugene.loh@...cle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@...cle.com>
Nick Alcock (8):
kbuild: bring back tristate.conf
kbuild: add modules_thick.builtin
kbuild: generate an address ranges map at vmlinux link time
kallsyms: introduce sections needed to map symbols to built-in modules
kallsyms: optimize .kallsyms_modules*
kallsyms: distinguish text symbols fully using object file names
kallsyms: add /proc/kallmodsyms for text symbol disambiguation
perf: proof-of-concept kallmodsyms support
.gitignore | 1 +
Documentation/dontdiff | 1 +
Documentation/kbuild/kconfig.rst | 5 +
Kbuild | 22 +
Makefile | 9 +-
init/Kconfig | 9 +
kernel/kallsyms.c | 277 ++++++-
kernel/kallsyms_internal.h | 14 +
scripts/Kbuild.include | 6 +
scripts/Makefile | 6 +
scripts/Makefile.modbuiltin | 56 ++
scripts/kallsyms.c | 1187 +++++++++++++++++++++++++++++-
scripts/kconfig/confdata.c | 41 +-
scripts/link-vmlinux.sh | 15 +-
scripts/modules_thick.c | 200 +++++
scripts/modules_thick.h | 48 ++
tools/perf/builtin-kallsyms.c | 35 +-
tools/perf/util/event.c | 14 +-
tools/perf/util/machine.c | 6 +-
tools/perf/util/machine.h | 1 +
tools/perf/util/symbol.c | 207 ++++--
tools/perf/util/symbol.h | 12 +-
22 files changed, 2073 insertions(+), 99 deletions(-)
create mode 100644 scripts/Makefile.modbuiltin
create mode 100644 scripts/modules_thick.c
create mode 100644 scripts/modules_thick.h
--
2.38.0.266.g481848f278
Powered by blists - more mailing lists