[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZsS7gxmiOLFsK+1y@oracle.com>
Date: Tue, 20 Aug 2024 11:51:31 -0400
From: Kris Van Hees <kris.van.hees@...cle.com>
To: Masahiro Yamada <masahiroy@...nel.org>
Cc: Kris Van Hees <kris.van.hees@...cle.com>, linux-kernel@...r.kernel.org,
linux-kbuild@...r.kernel.org, linux-modules@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
Nick Alcock <nick.alcock@...cle.com>,
Alan Maguire <alan.maguire@...cle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Luis Chamberlain <mcgrof@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Jiri Olsa <olsajiri@...il.com>,
Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [PATCH v6 3/4] scripts: add verifier script for builtin module
range data
On Sun, Aug 18, 2024 at 03:40:36PM +0900, Masahiro Yamada wrote:
> On Fri, Aug 16, 2024 at 12:04???AM Kris Van Hees <kris.van.hees@...cle.com> wrote:
> >
> > The modules.builtin.ranges offset range data for builtin modules is
> > generated at compile time based on the list of built-in modules and
> > the vmlinux.map and vmlinux.o.map linker maps. This data can be used
> > to determine whether a symbol at a particular address belongs to
> > module code that was configured to be compiled into the kernel proper
> > as a built-in module (rather than as a standalone module).
> >
> > This patch adds a script that uses the generated modules.builtin.ranges
> > data to annotate the symbols in the System.map with module names if
> > their address falls within a range that belongs to one or more built-in
> > modules.
> >
> > It then processes the vmlinux.map (and if needed, vmlinux.o.map) to
> > verify the annotation:
> >
> > - For each top-level section:
> > - For each object in the section:
> > - Determine whether the object is part of a built-in module
> > (using modules.builtin and the .*.cmd file used to compile
> > the object as suggested in [0])
> > - For each symbol in that object, verify that the built-in
> > module association (or lack thereof) matches the annotation
> > given to the symbol.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees@...cle.com>
> > Reviewed-by: Nick Alcock <nick.alcock@...cle.com>
> > Reviewed-by: Alan Maguire <alan.maguire@...cle.com>
> > ---
> > Changes since v5:
> > - Added optional 6th argument to specify kernel build directory.
> > - Report error and exit if .*.o.cmd files cannot be read.
> >
> > Changes since v4:
> > - New patch in the series
> > ---
> > scripts/verify_builtin_ranges.awk | 365 ++++++++++++++++++++++++++++++
> > 1 file changed, 365 insertions(+)
> > create mode 100755 scripts/verify_builtin_ranges.awk
> >
> > diff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk
> > new file mode 100755
> > index 000000000000..b82cf0a0fbeb
> > --- /dev/null
> > +++ b/scripts/verify_builtin_ranges.awk
> > @@ -0,0 +1,365 @@
> > +#!/usr/bin/gawk -f
> > +# SPDX-License-Identifier: GPL-2.0
> > +# verify_builtin_ranges.awk: Verify address range data for builtin modules
> > +# Written by Kris Van Hees <kris.van.hees@...cle.com>
> > +#
> > +# Usage: verify_builtin_ranges.awk modules.builtin.ranges System.map \
> > +# modules.builtin vmlinux.map vmlinux.o.map \
> > +# [ <build-dir> ]
> > +#
> > +
> > +# Return the module name(s) (if any) associated with the given object.
> > +#
> > +# If we have seen this object before, return information from the cache.
> > +# Otherwise, retrieve it from the corresponding .cmd file.
> > +#
> > +function get_module_info(fn, mod, obj, mfn, s) {
> > + if (fn in omod)
> > + return omod[fn];
> > +
> > + if (match(fn, /\/[^/]+$/) == 0)
> > + return "";
> > +
> > + obj = fn;
> > + mod = "";
> > + mfn = "";
> > + fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> > + if (getline s <fn == 1) {
> > + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> > + mfn = substr(s, RSTART + 16, RLENGTH - 16);
> > + gsub(/['"]/, "", mfn);
> > +
> > + mod = mfn;
> > + gsub(/([^/ ]*\/)+/, "", mod);
> > + gsub(/-/, "_", mod);
> > + }
> > + } else {
> > + print "ERROR: Failed to read: " fn "\n\n" \
> > + " Invalid kernel build directory (" kdir ")\n" \
> > + " or its content does not match " ARGV[1] >"/dev/stderr";
> > + close(fn);
> > + total = 0;
> > + exit(1);
> > + }
> > + close(fn);
> > +
> > + # A single module (common case) also reflects objects that are not part
> > + # of a module. Some of those objects have names that are also a module
> > + # name (e.g. core). We check the associated module file name, and if
> > + # they do not match, the object is not part of a module.
> > + if (mod !~ / /) {
> > + if (!(mod in mods))
> > + return "";
> > + if (mods[mod] != mfn)
> > + return "";
> > + }
> > +
> > + # At this point, mod is a single (valid) module name, or a list of
> > + # module names (that do not need validation).
> > + omod[obj] = mod;
> > + close(fn);
> > +
> > + return mod;
> > +}
> >
>
>
>
> This code is copy-paste from scripts/generate_builtin_ranges.awk
> So, my comments in 2/4 can apply to this patch, too.
Yes, and I will apply the same changes to the verifier script.
> Instead of adding a separate script,
> we could add a "verify mode" option.
>
>
> scripts/generate_builtin_ranges.awk --verify ...
>
>
> But, I do not know how much cleaner it will become.
>
> I am not good at reviewing AWK code, but this
> is how you go.
I think that adding the verifier functionality into the generator script would
make things more complex. The nature of AWK is that it works best in terms of
a liner walk of the content of the files it is given as input. Since the
generator and verifier scripts have different inputs (and especially since the
first couple of files that they need to process first differ), the script would
become more complex. The fact that different actions are taken on the same
input records between the generator and verifier also complicates matters.
And keeping them separate makes it easier to optimize the scripts.
> If this script were written in Python,
> it would be easy and readable to
> split logically-related code chunks into functions,
> as follows:
>
>
> def parse_module_builtin():
> ...
>
>
> def parse_vmlinux_map_lld():
> ...
>
>
> def parse_vmlinux_map_bfd():
> ...
>
>
> def parse_vmlinux_o_map():
> ...
I am much better with AWK than with Python, so I went with AWK. This is a
task it is remarkably well suited for given that a simple linear read through
the files accomplishes almost everything that is needed. Thank you for bearing
with me on doing this with AWK.
Kris
Powered by blists - more mailing lists