[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZsS6BNFRN58l9ppM@oracle.com>
Date: Tue, 20 Aug 2024 11:45:08 -0400
From: Kris Van Hees <kris.van.hees@...cle.com>
To: Masahiro Yamada <masahiroy@...nel.org>
Cc: Kris Van Hees <kris.van.hees@...cle.com>, linux-kernel@...r.kernel.org,
linux-kbuild@...r.kernel.org, linux-modules@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
Nick Alcock <nick.alcock@...cle.com>,
Alan Maguire <alan.maguire@...cle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Luis Chamberlain <mcgrof@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Jiri Olsa <olsajiri@...il.com>,
Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [PATCH v6 2/4] kbuild, kconfig: generate offset range data for
builtin modules
On Sun, Aug 18, 2024 at 03:19:36PM +0900, Masahiro Yamada wrote:
> On Fri, Aug 16, 2024 at 12:04???AM Kris Van Hees <kris.van.hees@...cle.com> wrote:
>
>
> The subject should be:
> "kbuild: generate offset range data for builtin modules"
>
>
> (Drop ", kconfig")
Thank you - applied.
> >
> > Create file module.builtin.ranges that can be used to find where
> > built-in modules are located by their addresses. This will be useful for
> > tracing tools to find what functions are for various built-in modules.
> >
> > The offset range data for builtin modules is generated using:
> > - modules.builtin: associates object files with module names
> > - vmlinux.map: provides load order of sections and offset of first member
> > per section
> > - vmlinux.o.map: provides offset of object file content per section
> > - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME
>
>
> I do not see "KBUILD_MODNAME" in the code.
> It only checks "KUILD_MODFILE".
Ah yes, that was a leftover from the earlier implementation. Updated.
> >
> > The generated data will look like:
> >
> > .text 00000000-00000000 = _text
> > .text 0000baf0-0000cb10 amd_uncore
> > .text 0009bd10-0009c8e0 iosf_mbi
> > ...
> > .text 008e6660-008e9630 snd_soc_wcd_mbhc
> > .text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x
>
>
>
> It is good to note that multiple module names appear
> in one line, but the instance (snd_soc_wcd933*) no longer
> occurs since 11b0b802f8e38d48ca74d520028add81263f003e.
>
>
> I recommend to replace the output snippet with:
>
>
> .text 00b9f080-00ba011a intel_skl_int3472_discrete
> .text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
> .text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
>
>
> This still happens when CONFIG_INTEL_SKL_INT3472=y.
Applied. Thanks for pointing this out - I didn't noticed that the original
case was no longer present.
> > .text 008ea610-008ea780 snd_soc_wcd9335
> > ...
> > .data 00000000-00000000 = _sdata
> > .data 0000f020-0000f680 amd_uncore
> >
> > For each ELF section, it lists the offset of the first symbol. This can
> > be used to determine the base address of the section at runtime.
> >
> > Next, it lists (in strict ascending order) offset ranges in that section
> > that cover the symbols of one or more builtin modules. Multiple ranges
> > can apply to a single module, and ranges can be shared between modules.
> >
> > The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
> > is generated for kernel modules that are built into the kernel image.
> >
> > How it works:
> >
> > 1. The modules.builtin file is parsed to obtain a list of built-in
> > module names and their associated object names (the .ko file that
> > the module would be in if it were a loadable module, hereafter
> > referred to as <kmodfile>). This object name can be used to
> > identify objects in the kernel compile because any C or assembler
> > code that ends up into a built-in module will have the option
> > -DKBUILD_MODFILE=<kmodfile> present in its build command, and those
> > can be found in the .<obj>.cmd file in the kernel build tree.
> >
> > If an object is part of multiple modules, they will all be listed
> > in the KBUILD_MODFILE option argument.
> >
> > This allows us to conclusively determine whether an object in the
> > kernel build belong to any modules, and which.
> >
> > 2. The vmlinux.map is parsed next to determine the base address of each
> > top level section so that all addresses into the section can be
> > turned into offsets. This makes it possible to handle sections
> > getting loaded at different addresses at system boot.
> >
> > We also determine an 'anchor' symbol at the beginning of each
> > section to make it possible to calculate the true base address of
> > a section at runtime (i.e. symbol address - symbol offset).
> >
> > We collect start addresses of sections that are included in the top
> > level section. This is used when vmlinux is linked using vmlinux.o,
> > because in that case, we need to look at the vmlinux.o linker map to
> > know what object a symbol is found in.
> >
> > And finally, we process each symbol that is listed in vmlinux.map
> > (or vmlinux.o.map) based on the following structure:
> >
> > vmlinux linked from vmlinux.a:
> >
> > vmlinux.map:
> > <top level section>
> > <included section> -- might be same as top level section)
> > <object> -- built-in association known
> > <symbol> -- belongs to module(s) object belongs to
> > ...
> >
> > vmlinux linked from vmlinux.o:
> >
> > vmlinux.map:
> > <top level section>
> > <included section> -- might be same as top level section)
> > vmlinux.o -- need to use vmlinux.o.map
> > <symbol> -- ignored
> > ...
> >
> > vmlinux.o.map:
> > <section>
> > <object> -- built-in association known
> > <symbol> -- belongs to module(s) object belongs to
> > ...
> >
> > 3. As sections, objects, and symbols are processed, offset ranges are
> > constructed in a striaght-forward way:
> >
> > - If the symbol belongs to one or more built-in modules:
> > - If we were working on the same module(s), extend the range
> > to include this object
> > - If we were working on another module(s), close that range,
> > and start the new one
> > - If the symbol does not belong to any built-in modules:
> > - If we were working on a module(s) range, close that range
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees@...cle.com>
> > Reviewed-by: Nick Alcock <nick.alcock@...cle.com>
> > Reviewed-by: Alan Maguire <alan.maguire@...cle.com>
> > Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>
> > ---
> > Changes since v5:
> > - Removed unnecessary compatibility info from option description.
> >
> > Changes since v4:
> > - Improved commit description to explain the why and how.
> > - Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
> > - Improved comments in generate_builtin_ranges.awk
> > - Improved logic in generate_builtin_ranges.awk to handle incorrect
> > object size information in linker maps
> >
> > Changes since v3:
> > - Consolidated patches 2 through 5 into a single patch
> > - Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
> > - Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
> > - Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
> > - Support LLVM (lld) compiles in generate_builtin_ranges.awk
> > - Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
> >
> > Changes since v2:
> > - Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
> > - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
> > - Switched from using modules.builtin.objs to parsing .*.cmd files
> > - Parse data from .*.cmd in generate_builtin_ranges.awk
> > - Use $(real-prereqs) rather than $(filter-out ...)
> > ---
>
> > System utilities
>
> > index a30c03a66172..dcdf14ffe031 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -571,6 +571,22 @@ config VMLINUX_MAP
> > pieces of code get eliminated with
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
> >
> > +config BUILTIN_MODULE_RANGES
> > + bool "Generate address range information for builtin modules"
> > + depends on !LTO_CLANG_FULL
> > + depends on !LTO_CLANG_THIN
> > + select VMLINUX_MAP
>
>
> I still got
>
> "WARNING: unmet direct dependencies detected for VMLINUX_MAP"
>
>
> I suggested "depends on VMLINUX_MAP" instead of "select VMLINUX_MAP".
>
>
>
> https://lore.kernel.org/linux-kbuild/202405150623.lmS5sVhM-lkp@intel.com/
>
> https://lore.kernel.org/linux-kbuild/CAK7LNAST_SbaN9WQRM_k0xE1MUReJvn9AMSg4A1-9b9xotf67w@mail.gmail.com/
Updated. Sorry about that - it should have been "depend on".
> > + help
> > + When modules are built into the kernel, there will be no module name
> > + associated with its symbols in /proc/kallsyms. Tracers may want to
> > + identify symbols by module name and symbol name regardless of whether
> > + the module is configured as loadable or not.
> > +
> > + This option generates modules.builtin.ranges in the build tree with
> > + offset ranges (per ELF section) for the module(s) they belong to.
> > + It also records an anchor symbol to determine the load address of the
> > + section.
> > +
> > config DEBUG_FORCE_WEAK_PER_CPU
> > bool "Force weak per-cpu definitions"
> > depends on DEBUG_KERNEL
> > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> > index 49946cb96844..7e21162e9de1 100644
> > --- a/scripts/Makefile.vmlinux
> > +++ b/scripts/Makefile.vmlinux
> > @@ -33,6 +33,22 @@ targets += vmlinux
> > vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
> > +$(call if_changed_dep,link_vmlinux)
> >
> > +# module.builtin.ranges
> > +# ---------------------------------------------------------------------------
> > +ifdef CONFIG_BUILTIN_MODULE_RANGES
> > +__default: modules.builtin.ranges
> > +
> > +quiet_cmd_modules_builtin_ranges = GEN $@
> > + cmd_modules_builtin_ranges = \
> > + $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@
> > +
> > +vmlinux.map: vmlinux
>
>
> This should be:
>
>
> vmlinux.map: vmlinux
> @:
>
>
> Otherwise, GNU Make would try to find a pattern rule
> to update vmlinux.map.
Ah, I didn't realize that. Thanks!
> > +
> > +targets += modules.builtin.ranges
> > +modules.builtin.ranges: modules.builtin vmlinux.map vmlinux.o.map FORCE
> > + $(call if_changed,modules_builtin_ranges)
>
>
>
> Presumably, modules.builtin.ranges should be regenerated when
> scripts/generate_builtin_ranges.awk is changed.
>
>
> Maybe, you can do this:
>
>
> quiet_cmd_modules_builtin_ranges = GEN $@
> cmd_modules_builtin_ranges = $(real-prereqs) > $@
>
> targets += modules.builtin.ranges
> modules.builtin.ranges: $(srctree)/scripts/generate_builtin_ranges.awk \
> modules.builtin vmlinux.map vmlinux.o.map FORCE
> $(call if_changed,modules_builtin_ranges)
I had thought about that and didn't do it. But I certainly agree that it
can be a good idea, so yes, let's do it. It certainly helps while doing
more development work or debugging with the generator script.
> > +endif
> > +
> > # Add FORCE to the prequisites of a target to force it to be always rebuilt.
> > # ---------------------------------------------------------------------------
> >
> > diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
> > index 6de297916ce6..252505505e0e 100644
> > --- a/scripts/Makefile.vmlinux_o
> > +++ b/scripts/Makefile.vmlinux_o
> > @@ -45,9 +45,12 @@ objtool-args = $(vmlinux-objtool-args-y) --link
> > # Link of vmlinux.o used for section mismatch analysis
> > # ---------------------------------------------------------------------------
> >
> > +vmlinux-o-ld-args-$(CONFIG_BUILTIN_MODULE_RANGES) += -Map=$@...p
> > +
> > quiet_cmd_ld_vmlinux.o = LD $@
> > cmd_ld_vmlinux.o = \
> > $(LD) ${KBUILD_LDFLAGS} -r -o $@ \
> > + $(vmlinux-o-ld-args-y) \
> > $(addprefix -T , $(initcalls-lds)) \
> > --whole-archive vmlinux.a --no-whole-archive \
> > --start-group $(KBUILD_VMLINUX_LIBS) --end-group \
> > diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
> > new file mode 100755
> > index 000000000000..9b647781d5fe
> > --- /dev/null
> > +++ b/scripts/generate_builtin_ranges.awk
> > @@ -0,0 +1,515 @@
> > +#!/usr/bin/gawk -f
> > +# SPDX-License-Identifier: GPL-2.0
> > +# generate_builtin_ranges.awk: Generate address range data for builtin modules
> > +# Written by Kris Van Hees <kris.van.hees@...cle.com>
> > +#
> > +# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
> > +# vmlinux.o.map > modules.builtin.ranges
> > +#
> > +
> > +# Return the module name(s) (if any) associated with the given object.
> > +#
> > +# If we have seen this object before, return information from the cache.
> > +# Otherwise, retrieve it from the corresponding .cmd file.
> > +#
> > +function get_module_info(fn, mod, obj, mfn, s) {
> > + if (fn in omod)
> > + return omod[fn];
> > +
> > + if (match(fn, /\/[^/]+$/) == 0)
> > + return "";
> > +
> > + obj = fn;
> > + mod = "";
> > + mfn = "";
> > + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> > + if (getline s <fn == 1) {
> > + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> > + mfn = substr(s, RSTART + 16, RLENGTH - 16);
> > + gsub(/['"]/, "", mfn);
> > +
> > + mod = mfn;
> > + gsub(/([^/ ]*\/)+/, "", mod);
> > + gsub(/-/, "_", mod);
> > + }
> > + }
> > + close(fn);
> > +
> > + # A single module (common case) also reflects objects that are not part
> > + # of a module. Some of those objects have names that are also a module
> > + # name (e.g. core). We check the associated module file name, and if
> > + # they do not match, the object is not part of a module.
> > + if (mod !~ / /) {
> > + if (!(mod in mods))
> > + mod = "";
> > + if (mods[mod] != mfn)
> > + mod = "";
> > + }
> > +
> > + # At this point, mod is a single (valid) module name, or a list of
> > + # module names (that do not need validation).
> > + omod[obj] = mod;
> > + close(fn);
>
>
> Is this "close(fn)" necessary?
> I see it a few lines above too.
Good catch - not needed.
> The code became way simpler since my previous review, but
> I think this is still redundant.
>
> You do not need to check both of modname and its path.
>
> I attached a patch for code refactoring.
Thank you! I didn't think of the approach to keep <dir>/<mod> as the key,
but that is indeed simpler.
I'll squash your patch into mine. Thank you for the good suggestions.
Kris
> From fcdc459ce4c7eb84549e45cf06a3a44f90aa3cf9 Mon Sep 17 00:00:00 2001
> From: Masahiro Yamada <masahiroy@...nel.org>
> Date: Fri, 16 Aug 2024 23:55:51 +0900
> Subject: [PATCH] fixup modules.builtin.ranges
>
> Signed-off-by: Masahiro Yamada <masahiroy@...nel.org>
> ---
> lib/Kconfig.debug | 2 +-
> scripts/Makefile.vmlinux | 12 +++++++-----
> scripts/generate_builtin_ranges.awk | 25 ++++++++-----------------
> 3 files changed, 16 insertions(+), 23 deletions(-)
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index dcdf14ffe031..f087dc3da321 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -575,7 +575,7 @@ config BUILTIN_MODULE_RANGES
> bool "Generate address range information for builtin modules"
> depends on !LTO_CLANG_FULL
> depends on !LTO_CLANG_THIN
> - select VMLINUX_MAP
> + depends on VMLINUX_MAP
> help
> When modules are built into the kernel, there will be no module name
> associated with its symbols in /proc/kallsyms. Tracers may want to
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 7e21162e9de1..7e8b703799c8 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -39,14 +39,16 @@ ifdef CONFIG_BUILTIN_MODULE_RANGES
> __default: modules.builtin.ranges
>
> quiet_cmd_modules_builtin_ranges = GEN $@
> - cmd_modules_builtin_ranges = \
> - $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@
> -
> -vmlinux.map: vmlinux
> + cmd_modules_builtin_ranges = $(real-prereqs) > $@
>
> targets += modules.builtin.ranges
> -modules.builtin.ranges: modules.builtin vmlinux.map vmlinux.o.map FORCE
> +modules.builtin.ranges: $(srctree)/scripts/generate_builtin_ranges.awk \
> + modules.builtin vmlinux.map vmlinux.o.map FORCE
> $(call if_changed,modules_builtin_ranges)
> +
> +vmlinux.map: vmlinux
> + @:
> +
> endif
>
> # Add FORCE to the prequisites of a target to force it to be always rebuilt.
> diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
> index 9b647781d5fe..865cb7ac4970 100755
> --- a/scripts/generate_builtin_ranges.awk
> +++ b/scripts/generate_builtin_ranges.awk
> @@ -12,7 +12,7 @@
> # If we have seen this object before, return information from the cache.
> # Otherwise, retrieve it from the corresponding .cmd file.
> #
> -function get_module_info(fn, mod, obj, mfn, s) {
> +function get_module_info(fn, mod, obj, s) {
> if (fn in omod)
> return omod[fn];
>
> @@ -21,16 +21,11 @@ function get_module_info(fn, mod, obj, mfn, s) {
>
> obj = fn;
> mod = "";
> - mfn = "";
> fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> if (getline s <fn == 1) {
> if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> - mfn = substr(s, RSTART + 16, RLENGTH - 16);
> - gsub(/['"]/, "", mfn);
> -
> - mod = mfn;
> - gsub(/([^/ ]*\/)+/, "", mod);
> - gsub(/-/, "_", mod);
> + mod = substr(s, RSTART + 16, RLENGTH - 16);
> + gsub(/['"]/, "", mod);
> }
> }
> close(fn);
> @@ -42,10 +37,11 @@ function get_module_info(fn, mod, obj, mfn, s) {
> if (mod !~ / /) {
> if (!(mod in mods))
> mod = "";
> - if (mods[mod] != mfn)
> - mod = "";
> }
>
> + gsub(/([^/ ]*\/)+/, "", mod);
> + gsub(/-/, "_", mod);
> +
> # At this point, mod is a single (valid) module name, or a list of
> # module names (that do not need validation).
> omod[obj] = mod;
> @@ -76,18 +72,13 @@ function update_entry(osect, mod, soff, eoff, sect, idx) {
> #
> # Lines will be like:
> # kernel/crypto/lzo-rle.ko
> -# and we derive the built-in module name from this as "lzo_rle" and associate
> -# it with object name "crypto/lzo-rle".
> +# and we record the object name "crypto/lzo-rle".
> #
> ARGIND == 1 {
> sub(/kernel\//, ""); # strip off "kernel/" prefix
> sub(/\.ko$/, ""); # strip off .ko suffix
>
> - mod = $1;
> - sub(/([^/]*\/)+/, "", mod); # mod = basename($1)
> - gsub(/-/, "_", mod); # Convert - to _
> -
> - mods[mod] = $1;
> + mods[$1] = 1;
> next;
> }
>
> --
> 2.43.0
>
Powered by blists - more mailing lists