linux-kernel - Re: [PATCH] gen_compile_commands: fix overlooked module files

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK7LNARsHxH1LF8Pq70EMAYW-p1btgAVC1cJMOkXSTjW5LZuKA@mail.gmail.com>
Date:   Wed, 15 Jun 2022 18:23:22 +0900
From:   Masahiro Yamada <masahiroy@...nel.org>
To:     John Hubbard <jhubbard@...dia.com>
Cc:     Nick Desaulniers <ndesaulniers@...gle.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Tom Rix <trix@...hat.com>, Jason Gunthorpe <jgg@...dia.com>,
        LKML <linux-kernel@...r.kernel.org>,
        clang-built-linux <llvm@...ts.linux.dev>
Subject: Re: [PATCH] gen_compile_commands: fix overlooked module files

On Wed, Jun 15, 2022 at 3:33 PM John Hubbard <jhubbard@...dia.com> wrote:
>
> scripts/clang-tools/gen_compile_commands.py incorrectly assumes that
> each .mod file only contains one line.

Thanks for catching this.

That assumption was correct until recently.
  The first line contained member objects.
  The second line, if CONFIG_TRIM_UNUSED_KSYMS=y, contained unresolved symbols



Commit 9413e7640564 ("kbuild: split the second line of *.mod into *.usyms")
changed the format of *.mod so member objects are listed per-line.




> In fact, such files contain one
> entry per line, and for some subsystems, there can be many, many lines.
> For example, Nouveau has 762 entries, but only the first entry was being
> processed. This problem causes clangd to fail to provide references and
> definitions for valid files that are part of the current kernel
> configuration.
>
> This problem only occurs when using Kbuild to generate, like this:
>
>    make CC=clang compile_commands.json
>
> It does not occur if you just run gen_compile_commands.py "bare", like
> this (below):
>
>    scripts/clang-tools/gen_compile_commands.py/gen_compile_commands.py .
>
> Fix this by fully processing each .mod file. This fix causes the number
> of build commands that clangd finds in my kernel build (these numbers
> are heavily dependent upon .config), from 2848 to 5292, which is an 85%
> increase.
>
> Fixes: ecca4fea1ede4 ("gen_compile_commands: support *.o, *.a, modules.order in positional argument")

This should be

Fixes: 9413e7640564 ("kbuild: split the second line of *.mod into *.usyms")


Can you update the commit log?


> Cc: Masahiro Yamada <masahiroy@...nel.org>
> Cc: Nick Desaulniers <ndesaulniers@...gle.com>
> Signed-off-by: John Hubbard <jhubbard@...dia.com>
> ---
>  scripts/clang-tools/gen_compile_commands.py | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
> index 1d1bde1fd45e..53590e886889 100755
> --- a/scripts/clang-tools/gen_compile_commands.py
> +++ b/scripts/clang-tools/gen_compile_commands.py
> @@ -157,10 +157,11 @@ def cmdfiles_for_modorder(modorder):
>              if ext != '.ko':
>                  sys.exit('{}: module path must end with .ko'.format(ko))
>              mod = base + '.mod'
> -           # The first line of *.mod lists the objects that compose the module.
> +           # Read from *.mod, to get a list of objects that compose the module.
>              with open(mod) as m:
> -                for obj in m.readline().split():
> -                    yield to_cmdfile(obj)
> +                for line in m.readlines():


                    for line in m:

is simpler, (and maybe will work more efficiently).


One more note, the 'line' iterator is shadowing (overwriting)
the outer 'line' iterator, which has been used a few lines above.

    with open(modorder) as f:
        for line in f:



Maybe, it is safer to use a different name for the inner iterator
because shadowing does not work in Python.





> +                    for obj in line.split():

This loop is unneeded because each line
contains only one word.
.rstpip() will do.



To sum up, this part can be simpler,
for example like this:

           # Read from *.mod, to get a list of objects that compose the module.
            with open(mod) as m:
                for line2 in m:
                    yield to_cmdfile(line2.rstrip())





> +                        yield to_cmdfile(obj)
>
>
>  def process_line(root_directory, command_prefix, file_path):
> --
> 2.36.1
>


-- 
Best Regards
Masahiro Yamada