lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Jan 2024 17:04:11 +0100
From: Antonio Borneo <antonio.borneo@...s.st.com>
To: Joe Perches <joe@...ches.com>, Andy Whitcroft <apw@...onical.com>,
        Dwaipayan Ray <dwaipayanray1@...il.com>,
        Lukas Bulwahn
	<lukas.bulwahn@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
CC: <linux-kernel@...r.kernel.org>,
        Clément Léger
	<clement.leger@...tlin.com>,
        Clément Le Goffic
	<clement.legoffic@...s.st.com>,
        <linux-stm32@...md-mailman.stormreply.com>
Subject: Re: [PATCH] checkpatch: use utf-8 match for spell checking

On Tue, 2023-12-12 at 11:07 -0800, Joe Perches wrote:
> On Tue, 2023-12-12 at 10:43 +0100, Antonio Borneo wrote:
> > The current code that checks for misspelling verifies, in a more
> > complex regex, if $rawline matches [^\w]($misspellings)[^\w]
> > 
> > Being $rawline a byte-string, a utf-8 character in $rawline can
> > match the non-word-char [^\w].
> > E.g.:
> >         ./script/checkpatch.pl --git 81c2f059ab9
> >         WARNING: 'ment' may be misspelled - perhaps 'meant'?
> >         #36: FILE: MAINTAINERS:14360:
> >         +M:     Clément Léger <clement.leger@...tlin.com>
> >                     ^^^^
> > 
> > Use a utf-8 version of $rawline for spell checking.
> > 
> > Signed-off-by: Antonio Borneo <antonio.borneo@...s.st.com>
> > Reported-by: Clément Le Goffic <clement.legoffic@...s.st.com>
> 
> Seems sensible, thanks, but:
> 
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -3477,7 +3477,8 @@ sub process {
> >  # Check for various typo / spelling mistakes
> >                 if (defined($misspellings) &&
> >                     ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> > -                       while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> > +                       my $rawline_utf8 = decode("utf8", $rawline);
> > +                       while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> >                                 my $typo = $1;
> >                                 my $blank = copy_spacing($rawline);
> 
> Maybe this needs to use $rawline_utf8 ?

Correct, I will send a v2!

> 
> >                                 my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
> 
> And may now the $fix bit will not always work properly

I have run some test and it looks ok with current ASCII file scripts/spelling.txt.

I have also tested adding some utf-8 string in the spelling file, but checkpatch reads it as
ASCII and extending it to utf-8 will require further modifications in checkpatch, way beyond
this simple fix.

Thanks for the review.
Antonio

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ