lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200812173958.2307251-3-masahiroy@kernel.org>
Date:   Thu, 13 Aug 2020 02:39:57 +0900
From:   Masahiro Yamada <masahiroy@...nel.org>
To:     linux-kbuild@...r.kernel.org
Cc:     Nathan Huckleberry <nhuck@...gle.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Tom Roeder <tmroeder@...gle.com>,
        clang-built-linux@...glegroups.com,
        Masahiro Yamada <masahiroy@...nel.org>,
        Michal Marek <michal.lkml@...kovi.net>,
        linux-kernel@...r.kernel.org
Subject: [PATCH 2/3] gen_compile_commands: wire up build rule to Makefile

Currently, you need to explicitly run scripts/gen_compile_commands.py
to create compile_commands.json. It traverses the object tree
(you need to pass the -d option to deal with a separate output tree),
and parses all the .*.cmd file found.

If you rebuild the kernel over again without 'make clean', stale
.*.cmd files from older builds will create invalid entries in
compile_commands.json.

This commit wires up the compile_commands.json rule to the top
Makefile, and makes it parse .*.cmd files only from the current build
to avoid stale entries.

It is possible to extract only relevant .*.cmd files by checking
$(KBUILD_VMLINUX_OBJS), $(KBUILD_VMLINUX_LIBS), and modules.order.
The objects or archives linked to vmlinux are listed in
$(KBUILD_VMLINUX_OBJS) or $(KBUILD_VMLINUX_LIBS). All the modules are
listed in modules.order.

You can create compile_commands.json from Make:

  $ make -j$(nproc) CC=clang compile_commands.json

Of course, you can build vmlinux, modules, and compile_commands.json
all together in a single command:

  $ make -j$(nproc) CC=clang all compile_commands.json

It works also for M= builds. In this case, compile_commands.json
is created in the top directory of the external module.

I hope this will be overall improvements, but it has a drawback;
the coverage of the compile_commands.json is reduced because only
the objects linked to vmlinux or modules are handled. For example,
the following C files are not included in compile_commands.json:

 - Decompressor source files (arch/*/boot/compressed/)
 - VDSO source files
 - C files used to generate intermediates (e.g. kernel/bounds.c)
 - standalone host programs

Here is a note for out-of-tree builds. 'make compile_commands.json'
works with O= option, but please notice compile_commands.json is
created in the object tree instead of the source tree.

Some people may want to have compile_commands.json in the source tree
because Clang Tools searches for it through all parent paths of the
first input source file.

However, you cannot do it for O= builds. Kbuild should never generate
any build artifact in the source tree when O= is given because the
source tree might be read-only. Any write attempt to the source tree
is monitored and the violation may be reported. See the commit log of
8ef14c2c41d9.

So, the only possible way it to create compile_commands.json in the
object tree, then specify '-p <build-path>' when you use clang-check,
clang-tidy, etc.

Signed-off-by: Masahiro Yamada <masahiroy@...nel.org>
---

 Makefile                        |  29 ++++++-
 scripts/gen_compile_commands.py | 146 +++++++++++++-------------------
 2 files changed, 82 insertions(+), 93 deletions(-)

diff --git a/Makefile b/Makefile
index 6844b848bfec..4d65affb6917 100644
--- a/Makefile
+++ b/Makefile
@@ -634,7 +634,7 @@ endif
 # in addition to whatever we do anyway.
 # Just "make" or "make all" shall build modules as well
 
-ifneq ($(filter all modules nsdeps,$(MAKECMDGOALS)),)
+ifneq ($(filter all modules nsdeps %compile_commands.json,$(MAKECMDGOALS)),)
   KBUILD_MODULES := 1
 endif
 
@@ -1459,7 +1459,8 @@ endif # CONFIG_MODULES
 
 # Directories & files removed with 'make clean'
 CLEAN_FILES += include/ksym vmlinux.symvers \
-	       modules.builtin modules.builtin.modinfo modules.nsdeps
+	       modules.builtin modules.builtin.modinfo modules.nsdeps \
+	       compile_commands.json
 
 # Directories & files removed with 'make mrproper'
 MRPROPER_FILES += include/config include/generated          \
@@ -1693,9 +1694,12 @@ KBUILD_MODULES := 1
 
 build-dirs := $(KBUILD_EXTMOD)
 PHONY += modules
-modules: descend
+modules: $(MODORDER)
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
 
+$(MODORDER): descend
+	@:
+
 PHONY += modules_install
 modules_install: _emodinst_ _emodinst_post
 
@@ -1709,8 +1713,12 @@ PHONY += _emodinst_post
 _emodinst_post: _emodinst_
 	$(call cmd,depmod)
 
+compile_commands.json: $(extmod-prefix)compile_commands.json
+PHONY += compile_commands.json
+
 clean-dirs := $(KBUILD_EXTMOD)
-clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps
+clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
+	$(KBUILD_EXTMOD)/compile_commands.json
 
 PHONY += help
 help:
@@ -1823,6 +1831,19 @@ nsdeps: export KBUILD_NSDEPS=1
 nsdeps: modules
 	$(Q)$(CONFIG_SHELL) $(srctree)/scripts/nsdeps
 
+# Clang Tooling
+# ---------------------------------------------------------------------------
+
+quiet_cmd_gen_compile_commands = GEN     $@
+      cmd_gen_compile_commands = $(PYTHON3) $< -a $(AR) -o $@ $(filter-out $<, $(real-prereqs))
+
+$(extmod-prefix)compile_commands.json: scripts/gen_compile_commands.py \
+	$(if $(KBUILD_EXTMOD),,$(KBUILD_VMLINUX_OBJS) $(KBUILD_VMLINUX_LIBS)) \
+	$(if $(CONFIG_MODULES), $(MODORDER)) FORCE
+	$(call if_changed,gen_compile_commands)
+
+targets += $(extmod-prefix)compile_commands.json
+
 # Scripts to check various things for consistency
 # ---------------------------------------------------------------------------
 
diff --git a/scripts/gen_compile_commands.py b/scripts/gen_compile_commands.py
index 19c7338740e7..d2ff0d982521 100755
--- a/scripts/gen_compile_commands.py
+++ b/scripts/gen_compile_commands.py
@@ -9,80 +9,49 @@
 
 import argparse
 import json
-import logging
 import os
 import re
-
-_DEFAULT_OUTPUT = 'compile_commands.json'
-_DEFAULT_LOG_LEVEL = 'WARNING'
-
-_FILENAME_PATTERN = r'^\..*\.cmd$'
-_LINE_PATTERN = r'^cmd_[^ ]*\.o := (.* )([^ ]*\.c)$'
-_VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
-
-# A kernel build generally has over 2000 entries in its compile_commands.json
-# database. If this code finds 300 or fewer, then warn the user that they might
-# not have all the .cmd files, and they might need to compile the kernel.
-_LOW_COUNT_THRESHOLD = 300
+import subprocess
 
 
 def parse_arguments():
     """Sets up and parses command-line arguments.
 
     Returns:
-        log_level: A logging level to filter log output.
-        directory: The directory to search for .cmd files.
+        ar: Command used for parsing .a archives
         output: Where to write the compile-commands JSON file.
+        files: Files to parse
     """
     usage = 'Creates a compile_commands.json database from kernel .cmd files'
     parser = argparse.ArgumentParser(description=usage)
 
-    directory_help = ('Path to the kernel source directory to search '
-                      '(defaults to the working directory)')
-    parser.add_argument('-d', '--directory', type=str, help=directory_help)
+    ar_help = 'command used for parsing .a archives'
+    parser.add_argument('-a', '--ar', type=str, default='ar', help=ar_help)
 
-    output_help = ('The location to write compile_commands.json (defaults to '
-                   'compile_commands.json in the search directory)')
-    parser.add_argument('-o', '--output', type=str, help=output_help)
+    output_help = 'output file for the compilation database'
+    parser.add_argument('-o', '--output', type=str,
+                        default='compile_commands.json', help=output_help)
 
-    log_level_help = ('The level of log messages to produce (one of ' +
-                      ', '.join(_VALID_LOG_LEVELS) + '; defaults to ' +
-                      _DEFAULT_LOG_LEVEL + ')')
-    parser.add_argument(
-        '--log_level', type=str, default=_DEFAULT_LOG_LEVEL,
-        help=log_level_help)
+    files_help='files to parse (should be *.o, *.a, or modules.order)'
+    parser.add_argument('files', type=str, nargs='*', help=files_help)
 
     args = parser.parse_args()
 
-    log_level = args.log_level
-    if log_level not in _VALID_LOG_LEVELS:
-        raise ValueError('%s is not a valid log level' % log_level)
-
-    directory = args.directory or os.getcwd()
-    output = args.output or os.path.join(directory, _DEFAULT_OUTPUT)
-    directory = os.path.abspath(directory)
-
-    return log_level, directory, output
+    return args.ar, args.output, args.files
 
 
-def process_line(root_directory, file_directory, command_prefix, relative_path):
+def process_line(root_directory, command_prefix, file_path):
     """Extracts information from a .cmd line and creates an entry from it.
 
     Args:
         root_directory: The directory that was searched for .cmd files. Usually
             used directly in the "directory" entry in compile_commands.json.
-        file_directory: The path to the directory the .cmd file was found in.
         command_prefix: The extracted command line, up to the last element.
-        relative_path: The .c file from the end of the extracted command.
-            Usually relative to root_directory, but sometimes relative to
-            file_directory and sometimes neither.
+        file_path: The .c file from the end of the extracted command.
+            It can be either relative or absolute.
 
     Returns:
         An entry to append to compile_commands.
-
-    Raises:
-        ValueError: Could not find the extracted file based on relative_path and
-            root_directory or file_directory.
     """
     # The .cmd files are intended to be included directly by Make, so they
     # escape the pound sign '#', either as '\#' or '$(pound)' (depending on the
@@ -90,60 +59,59 @@ def process_line(root_directory, file_directory, command_prefix, relative_path):
     # by Make, so this code replaces the escaped version with '#'.
     prefix = command_prefix.replace('\#', '#').replace('$(pound)', '#')
 
-    cur_dir = root_directory
-    expected_path = os.path.join(cur_dir, relative_path)
-    if not os.path.exists(expected_path):
-        # Try using file_directory instead. Some of the tools have a different
-        # style of .cmd file than the kernel.
-        cur_dir = file_directory
-        expected_path = os.path.join(cur_dir, relative_path)
-        if not os.path.exists(expected_path):
-            raise ValueError('File %s not in %s or %s' %
-                             (relative_path, root_directory, file_directory))
     return {
-        'directory': cur_dir,
-        'file': relative_path,
-        'command': prefix + relative_path,
+        'directory': root_directory,
+        'file': file_path,
+        'command': prefix + file_path,
     }
 
 
 def main():
-    """Walks through the directory and finds and parses .cmd files."""
-    log_level, directory, output = parse_arguments()
-
-    level = getattr(logging, log_level)
-    logging.basicConfig(format='%(levelname)s: %(message)s', level=level)
-
-    filename_matcher = re.compile(_FILENAME_PATTERN)
-    line_matcher = re.compile(_LINE_PATTERN)
+    """Find and parse .cmd files for vmlinux and modules"""
+    ar, output, files = parse_arguments()
+
+    line_matcher = re.compile(r'^cmd_[^ ]*\.o := (.* )([^ ]*\.c)$')
+
+    # Collect objects compiled for vmlinux or modules
+    objects = []
+    for file in files:
+        if file.endswith('.o'):
+            # Some objects (head-y) are linked to vmlinux directly
+            objects.append(file)
+        elif file.endswith('.a'):
+            # Most of built-in objects are linked via built-in.a or lib.a.
+            # Use 'ar -t' to get the list of the contained objects.
+            objects += subprocess.check_output([ar, '-t', file]).decode().split()
+        elif file.endswith('modules.order'):
+	    # modules.order lists all the modules.
+            with open(file) as f:
+                for line in f:
+                    ko = line.rstrip()
+                    base, ext = os.path.splitext(ko)
+                    if ext != '.ko':
+                        sys.exit('{}: mobule path must end with .ko'.format(ko))
+                    mod = base + '.mod'
+		    # The first line of *.mod lists the objects that
+		    # compose the module.
+                    with open(mod) as mod_f:
+                        objects += mod_f.readline().split()
+        else:
+            sys.exit('{}: unknown file type'.format(file))
 
     compile_commands = []
-    for dirpath, _, filenames in os.walk(directory):
-        for filename in filenames:
-            if not filename_matcher.match(filename):
-                continue
-            filepath = os.path.join(dirpath, filename)
-
-            with open(filepath, 'rt') as f:
-                line = f.readline()
-                result = line_matcher.match(line)
-                if result:
-                    try:
-                        entry = process_line(directory, dirpath,
-                                             result.group(1), result.group(2))
-                        compile_commands.append(entry)
-                    except ValueError as err:
-                        logging.info('Could not add line from %s: %s',
-                                     filepath, err)
+    cwd = os.getcwd()
+    for object in objects:
+        dir, notdir = os.path.split(object)
+        cmd_file = os.path.join(dir, '.' + notdir + '.cmd')
+        with open(cmd_file, 'rt') as f:
+            line = f.readline()
+            result = line_matcher.match(line)
+            if result:
+                entry = process_line(cwd, result.group(1), result.group(2))
+                compile_commands.append(entry)
 
     with open(output, 'wt') as f:
         json.dump(compile_commands, f, indent=2, sort_keys=True)
 
-    count = len(compile_commands)
-    if count < _LOW_COUNT_THRESHOLD:
-        logging.warning(
-            'Found %s entries. Have you compiled the kernel?', count)
-
-
 if __name__ == '__main__':
     main()
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ