lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1411986525.6334.32.camel@x220>
Date:	Mon, 29 Sep 2014 12:28:45 +0200
From:	Paul Bolle <pebolle@...cali.nl>
To:	Valentin Rothberg <valentinrothberg@...il.com>
Cc:	akpm@...ux-foundation.org,
	Stefan Hengelein <stefan.hengelein@....de>,
	linux-kbuild@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python

[Added  linux-kbuild@...r.kernel.org.]

On Sun, 2014-09-28 at 17:55 +0200, Valentin Rothberg wrote:
> The scripts/checkkconfigsymbols.sh script searches Kconfig features
> in the source code that are not defined in Kconfig. Such identifiers
> always evaluate to false and are the source of various kinds of bugs.
> However, the shell script is slow and it does not detect such broken
> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
> Furthermore, it generates false positives. The script is also hard to
> read and understand, and is thereby difficult to maintain.
> 
> This patch replaces the shell script with an implementation in Python,
> which:
>     (a) detects the same bugs, but does not report previous false positives
>     (b) additionally detects broken references in Kconfig and all
>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>     (c) is up to 75 times faster than the shell script
>     (d) only checks files under version control ('git ls-files')

(The shell script is .git unaware. If you happen to have one or more
branches with "Kconfig" in their name, as I do, it generates a lot of
noise on stderr.)

> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> from 3m47s to 0m3s, and reports 939 broken identifiers in Linux v3.17-rc1;
> 420 additional reports of which 16 are located in Kconfig files,
> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
> 
> Moreover, we intentionally include references in comments, which have been
> ignored until now. Such comments may be leftovers of features that have
> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
> These references can be misleading and should be removed or replaced.
> 
> Note that the output format changed from (file list <tab> feature) to
> (feature <tab> file list) as it simplifies the detection of the Kconfig
> feature for long file lists.
> 
> Signed-off-by: Valentin Rothberg <valentinrothberg@...il.com>
> Signed-off-by: Stefan Hengelein <stefan.hengelein@....de>
> ---
> Changelog:
> v2: Fix of regular expressions
> v3: Changelog replacement, and add changes of v2
> v4: Based on comments from Paul Bolle <pebolle@...cali.nl>
>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>   - Changes of regular expressions
>   - Increases additional reports from 49 to 229 compared to v3
>   - Change of output format from (file list <tab> feature) to
>         (feature <tab> file list)·
> v5: Only analyze files under version control ('git ls-files')
> v6: Cover features with numbers and small letters (e.g., 4xx)
> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
> ---
>  scripts/checkkconfigsymbols.py | 138 +++++++++++++++++++++++++++++++++++++++++
>  scripts/checkkconfigsymbols.sh |  59 ------------------
>  2 files changed, 138 insertions(+), 59 deletions(-)
>  create mode 100644 scripts/checkkconfigsymbols.py
>  delete mode 100755 scripts/checkkconfigsymbols.sh
> 
> diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> new file mode 100644
> index 0000000..01bf9c4
> --- /dev/null
> +++ b/scripts/checkkconfigsymbols.py
> @@ -0,0 +1,138 @@
> +#!/usr/bin/env python
> +
> +"""Find Kconfig identifiers that are referenced but not defined."""
> +
> +# (c) 2014 Valentin Rothberg <valentinrothberg@...il.com>
> +# (c) 2014 Stefan Hengelein <stefan.hengelein@....de>
> +#
> +# Licensed under the terms of the GNU GPL License version 2
> +
> +
> +import os
> +import re
> +from subprocess import Popen, PIPE, STDOUT
> +
> +
> +# regex expressions
> +OPERATORS = r"&|\(|\)|\||\!"
> +FEATURE = r"(?:\w*[A-Z0-9]\w*){2,}"
> +DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
> +EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR

Could please make that "depends on"? Yes, it seems the yacc grammar
accepts any amount of whitespace, but that doesn't make it right to use
anything other than a single space. (Can the yacc grammar be tweaked to
see "depends on" as one, well, token?)

> +
> +# regex objects
> +REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
> +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> +REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")

That should be "-D", because now you get a hit on
    TPS6507X_ADCONFIG_CONVERT_TS

and friends. What does \W do, by the way?

> +REGEX_KCONFIG_DEF = re.compile(DEF)
> +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> +REGEX_KCONFIG_STMT = re.compile(STMT)
> +REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
> +REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
> +
> +
> +def main():
> +    """Main function of this module."""
> +    source_files = []
> +    kconfig_files = []
> +    defined_features = set()
> +    referenced_features = dict()  # {feature: [files]}
> +
> +    # use 'git ls-files' to get the worklist
> +    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
> +    (stdout, _) = pop.communicate()  # wait until finished
> +    if len(stdout) > 0 and stdout[-1] == "\n":
> +        stdout = stdout[:-1]
> +
> +    for gitfile in stdout.rsplit("\n"):
> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
> +                os.path.isdir(gitfile):

(The only directories in the git tree are a few symlinks:
    arch/arm/boot/dts/include/dt-bindings
    arch/metag/boot/dts/include/dt-bindings
    arch/mips/boot/dts/include/dt-bindings
    arch/powerpc/boot/dts/include/dt-bindings

Filtering out symlinks makes sense anyway. But that's probably not worth
the effort.)

You might also consider filtering out "Next/merge.log". It tends to have
references to Kconfig macros. But it's only a few hits anyway.

> +            continue
> +        if REGEX_FILE_KCONFIG.match(gitfile):
> +            kconfig_files.append(gitfile)
> +        else:
> +            # all non-Kconfig files are checked for consistency
> +            source_files.append(gitfile)
> +
> +    for sfile in source_files:
> +        parse_source_file(sfile, referenced_features)
> +
> +    for kfile in kconfig_files:
> +        parse_kconfig_file(kfile, defined_features, referenced_features)
> +
> +    print "Undefined symbol used\tFile list"
> +    for feature in sorted(referenced_features):
> +        # filter some false positives
> +        if feature == "FOO" or feature == "BAR" or \
> +                feature == "FOO_BAR":
> +            continue
> +        if feature not in defined_features:
> +            if feature.endswith("_MODULE"):
> +                # avoid false positives for kernel modules
> +                if feature[:-len("_MODULE")] in defined_features:
> +                    continue
> +            files = referenced_features.get(feature)
> +            print "%s\t%s" % (feature, ", ".join(files))
> +
> +
> +def parse_source_file(sfile, referenced_features):
> +    """Parse @sfile for referenced Kconfig features."""
> +    lines = []
> +    with open(sfile, "r") as stream:
> +        lines = stream.readlines()
> +
> +    for line in lines:
> +        if not "CONFIG_" in line:
> +            continue
> +        features = REGEX_SOURCE_FEATURE.findall(line)
> +        for feature in features:
> +            if not REGEX_FILTER_FEATURES.search(feature):
> +                continue
> +            sfiles = referenced_features.get(feature, set())
> +            sfiles.add(sfile)
> +            referenced_features[feature] = sfiles
> +
> +
> +def get_features_in_line(line):
> +    """Return mentioned Kconfig features in @line."""
> +    return REGEX_FEATURE.findall(line)
> +
> +
> +def parse_kconfig_file(kfile, defined_features, referenced_features):
> +    """Parse @kfile and update feature definitions and references."""
> +    lines = []
> +    skip = False
> +
> +    with open(kfile, "r") as stream:
> +        lines = stream.readlines()
> +
> +    for i in range(len(lines)):
> +        line = lines[i]
> +        line = line.strip('\n')
> +        line = line.split("#")[0]  # ignore comments
> +
> +        if REGEX_KCONFIG_DEF.match(line):
> +            feature_def = REGEX_KCONFIG_DEF.findall(line)
> +            defined_features.add(feature_def[0])
> +            skip = False
> +        elif REGEX_KCONFIG_HELP.match(line):
> +            skip = True
> +        elif skip:
> +            # ignore content of help messages
> +            pass
> +        elif REGEX_KCONFIG_STMT.match(line):
> +            features = get_features_in_line(line)
> +            # multi-line statements
> +            while line.endswith("\\"):
> +                i += 1
> +                line = lines[i]
> +                line = line.strip('\n')
> +                features.extend(get_features_in_line(line))
> +            for feature in set(features):
> +                paths = referenced_features.get(feature, set())
> +                paths.add(kfile)
> +                referenced_features[feature] = paths
> +
> +
> +if __name__ == "__main__":
> +    main()
>[...]

This seems to find, roughly, what my local perl script finds. It does
skip references to Kconfig macros in the Kconfig help texts and
comments: grep for CONFIG_ITANIUM_ASTEP_SPECIFIC as an example. But
there are so few of those that it's probable not worth the trouble to
check for them too.

A few test show the speedup is especially nice with the entire tree in
cache: run time drops from over four minutes to just under five seconds
here. Provided you look into my comments, this is:

Acked-by: Paul Bolle <pebolle@...cali.nl>

Thanks,


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ