lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZsS7gxmiOLFsK+1y@oracle.com>
Date: Tue, 20 Aug 2024 11:51:31 -0400
From: Kris Van Hees <kris.van.hees@...cle.com>
To: Masahiro Yamada <masahiroy@...nel.org>
Cc: Kris Van Hees <kris.van.hees@...cle.com>, linux-kernel@...r.kernel.org,
        linux-kbuild@...r.kernel.org, linux-modules@...r.kernel.org,
        linux-trace-kernel@...r.kernel.org,
        Nick Alcock <nick.alcock@...cle.com>,
        Alan Maguire <alan.maguire@...cle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Jiri Olsa <olsajiri@...il.com>,
        Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [PATCH v6 3/4] scripts: add verifier script for builtin module
 range data

On Sun, Aug 18, 2024 at 03:40:36PM +0900, Masahiro Yamada wrote:
> On Fri, Aug 16, 2024 at 12:04???AM Kris Van Hees <kris.van.hees@...cle.com> wrote:
> >
> > The modules.builtin.ranges offset range data for builtin modules is
> > generated at compile time based on the list of built-in modules and
> > the vmlinux.map and vmlinux.o.map linker maps.  This data can be used
> > to determine whether a symbol at a particular address belongs to
> > module code that was configured to be compiled into the kernel proper
> > as a built-in module (rather than as a standalone module).
> >
> > This patch adds a script that uses the generated modules.builtin.ranges
> > data to annotate the symbols in the System.map with module names if
> > their address falls within a range that belongs to one or more built-in
> > modules.
> >
> > It then processes the vmlinux.map (and if needed, vmlinux.o.map) to
> > verify the annotation:
> >
> >   - For each top-level section:
> >      - For each object in the section:
> >         - Determine whether the object is part of a built-in module
> >           (using modules.builtin and the .*.cmd file used to compile
> >            the object as suggested in [0])
> >         - For each symbol in that object, verify that the built-in
> >           module association (or lack thereof) matches the annotation
> >           given to the symbol.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees@...cle.com>
> > Reviewed-by: Nick Alcock <nick.alcock@...cle.com>
> > Reviewed-by: Alan Maguire <alan.maguire@...cle.com>
> > ---
> >     Changes since v5:
> >      - Added optional 6th argument to specify kernel build directory.
> >      - Report error and exit if .*.o.cmd files cannot be read.
> >
> >     Changes since v4:
> >      - New patch in the series
> > ---
> >  scripts/verify_builtin_ranges.awk | 365 ++++++++++++++++++++++++++++++
> >  1 file changed, 365 insertions(+)
> >  create mode 100755 scripts/verify_builtin_ranges.awk
> >
> > diff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk
> > new file mode 100755
> > index 000000000000..b82cf0a0fbeb
> > --- /dev/null
> > +++ b/scripts/verify_builtin_ranges.awk
> > @@ -0,0 +1,365 @@
> > +#!/usr/bin/gawk -f
> > +# SPDX-License-Identifier: GPL-2.0
> > +# verify_builtin_ranges.awk: Verify address range data for builtin modules
> > +# Written by Kris Van Hees <kris.van.hees@...cle.com>
> > +#
> > +# Usage: verify_builtin_ranges.awk modules.builtin.ranges System.map \
> > +#                                 modules.builtin vmlinux.map vmlinux.o.map \
> > +#                                 [ <build-dir> ]
> > +#
> > +
> > +# Return the module name(s) (if any) associated with the given object.
> > +#
> > +# If we have seen this object before, return information from the cache.
> > +# Otherwise, retrieve it from the corresponding .cmd file.
> > +#
> > +function get_module_info(fn, mod, obj, mfn, s) {
> > +       if (fn in omod)
> > +               return omod[fn];
> > +
> > +       if (match(fn, /\/[^/]+$/) == 0)
> > +               return "";
> > +
> > +       obj = fn;
> > +       mod = "";
> > +       mfn = "";
> > +       fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> > +       if (getline s <fn == 1) {
> > +               if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> > +                       mfn = substr(s, RSTART + 16, RLENGTH - 16);
> > +                       gsub(/['"]/, "", mfn);
> > +
> > +                       mod = mfn;
> > +                       gsub(/([^/ ]*\/)+/, "", mod);
> > +                       gsub(/-/, "_", mod);
> > +               }
> > +       } else {
> > +               print "ERROR: Failed to read: " fn "\n\n" \
> > +                     "  Invalid kernel build directory (" kdir ")\n" \
> > +                     "  or its content does not match " ARGV[1] >"/dev/stderr";
> > +               close(fn);
> > +               total = 0;
> > +               exit(1);
> > +       }
> > +       close(fn);
> > +
> > +       # A single module (common case) also reflects objects that are not part
> > +       # of a module.  Some of those objects have names that are also a module
> > +       # name (e.g. core).  We check the associated module file name, and if
> > +       # they do not match, the object is not part of a module.
> > +       if (mod !~ / /) {
> > +               if (!(mod in mods))
> > +                       return "";
> > +               if (mods[mod] != mfn)
> > +                       return "";
> > +       }
> > +
> > +       # At this point, mod is a single (valid) module name, or a list of
> > +       # module names (that do not need validation).
> > +       omod[obj] = mod;
> > +       close(fn);
> > +
> > +       return mod;
> > +}
> >
> 
> 
> 
> This code is copy-paste from scripts/generate_builtin_ranges.awk
> So, my comments in 2/4 can apply to this patch, too.

Yes, and I will apply the same changes to the verifier script.

> Instead of adding a separate script,
> we could add a "verify mode" option.
> 
> 
>  scripts/generate_builtin_ranges.awk --verify ...
> 
> 
> But, I do not know how much cleaner it will become.
> 
> I am not good at reviewing AWK code, but this
> is how you go.

I think that adding the verifier functionality into the generator script would
make things more complex.  The nature of AWK is that it works best in terms of
a liner walk of the content of the files it is given as input.  Since the
generator and verifier scripts have different inputs (and especially since the
first couple of files that they need to process first differ), the script would
become more complex.  The fact that different actions are taken on the same
input records between the generator and verifier also complicates matters.
And keeping them separate makes it easier to optimize the scripts.

> If this script were written in Python,
> it would be easy and readable to
> split logically-related code chunks into functions,
> as follows:
> 
> 
> def parse_module_builtin():
>     ...
> 
> 
> def parse_vmlinux_map_lld():
>     ...
> 
> 
> def parse_vmlinux_map_bfd():
>     ...
> 
> 
> def parse_vmlinux_o_map():
>     ...

I am much better with AWK than with Python, so I went with AWK.  This is a
task it is remarkably well suited for given that a simple linear read through
the files accomplishes almost everything that is needed.  Thank you for bearing
with me on doing this with AWK.

	Kris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ