linux-kernel - Re: [PATCH 1/3] scripts: add spelling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHp75VfX95GVkd6iJ-aYNp7nO56nLSxgreE4fDXAm3h3p6VEjg@mail.gmail.com>
Date:   Fri, 11 Jun 2021 10:58:49 +0300
From:   Andy Shevchenko <andy.shevchenko@...il.com>
To:     Zhen Lei <thunder.leizhen@...wei.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Nicolas Dichtel <nicolas.dichtel@...nd.com>,
        Joe Perches <joe@...ches.com>, Jason Baron <jbaron@...mai.com>,
        Stefani Seibold <stefani@...bold.net>,
        Jacob Keller <jacob.e.keller@...el.com>,
        Thomas Graf <tgraf@...g.ch>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Jens Axboe <axboe@...nel.dk>, Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] scripts: add spelling_sanitizer.sh script

On Fri, Jun 11, 2021 at 10:19 AM Zhen Lei <thunder.leizhen@...wei.com> wrote:
>
> The file scripts/spelling.txt recorded a large number of
> "mistake||correction" pairs. These entries are currently maintained in
> order, but the results are not strict. In addition, when someone wants to
> add some new pairs, he either sort them manually or write a script, which
> is clearly a waste of labor. So add this script. It removes the duplicates
> first, then sort by correctly spelled words. Sorting based on misspelled
> words is not chose because it is uncontrollable.

chosen

...

> +#!/bin/sh

If you want to have stricter rules applied, use
#!/bin/sh -efu
in all your shell scripts, it will show you a lot of problems.

Missed SPDX.

> +src=spelling.txt

> +tmp=spelling_mistake_correction_pairs.txt

It will pollute the source tree, so use `mktemp` or utilize O=. In
case there is no O= supplied (or whatever equivalent to describe
output folder) you will get it in the source tree, so it needs to be
Git-ignored.

> +cd `dirname $0`

Useless use of dirname. Check for %, %%, #, and ## substitutions (`man sh`).
IIRC dirname equivalent is ${0%/*}.

> +# Convert the format of 'codespell' to the current
> +sed -r -i 's/ ==> /||/' $src
> +
> +# Move the spelling "mistake||correction" pairs into file $tmp

> +# There are currently 9 lines of comments in $src, so the text starts at line 10
> +sed -n '10,$p' $src > $tmp
> +sed -i '10,$d' $src

This is fragile, use proper comment line detection.

> +# Remove duplicates first, then sort by correctly spelled words
> +sort -u $tmp -o $tmp
> +sort -t '|' -k 3 $tmp -o $tmp

Can be one pipeline

> +# Append sorted results to comments
> +cat $tmp >> $src

I believe it can be done in a better way, but I was not thinking about it.

> +# Delete the temporary file
> +rm -f $tmp

What if the script will be trapped? It's good to handle SIGHUP I
suppose, so we won't leave garbage behind us.

> +cd - > /dev/null

-- 
With Best Regards,
Andy Shevchenko