lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1410332446.24028.26.camel@joe-AO725>
Date:	Wed, 10 Sep 2014 00:00:46 -0700
From:	Joe Perches <joe@...ches.com>
To:	Masanari Iida <standby24x7@...il.com>
Cc:	Kees Cook <keescook@...omium.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andy Whitcroft <apw@...onical.com>,
	Geert Uytterhoeven <geert@...ux-m68k.org>,
	linux-doc <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v2] checkpatch: look for common misspellings

On Wed, 2014-09-10 at 13:37 +0900, Masanari Iida wrote:
> Hello Joe, Kees,

Hello Masanari-san.

> Sorry for late reply.
> I was on holiday when the version 1 patch discussions were posted.

No worries, holidays are far more important
than patches like this...

These patches are simple niceties, not fixes
for bugs, so review and acceptance timing is
not urgent.

> I am using codespell ( https://github.com/lucasdemarchi/codespell/ ).
> The codespell has its own typo dictionary.
> The dictionary format is
> 
> typo->good   (1 candidate)
> typo->good1,good2,  (multiple candidates)
> typo->good, comment  (1 candidate with special remark)
> 
> Its similar to your  typo||good  format.
> 
> The license of the codespell is GPLv2 according to COPYING file in tar ball.
> 
> Compare number of typo samples in dictionary.
> Your dictionary :  1033
> codespell-1.4 :     4261
> codespell-1.4 + my adding 5245
> Your dictionary + codespell-1.4 + my adding - remove duplicate:  5742
> 
> Latest version of codespell is 1.7.
> My dictionary is based on codespell-1.4. So I use the number as of 1.4.
> 
> I can provide my typo samples under GPLv2 license.

Thanks.

Any additions you have to the dictionary would be
gladly welcomed.

Using a common format for the dictionary and any
suggested corrections would be good too.

Maybe the dictionary and code should be changed to
use the codespell format.  It seems a bit more
flexible than the lintian form.

I do not know if one project is more active than
the other, but perhaps that should be the deciding
factor.  Or maybe just Kees' preference...

Merging all these together might not be a good
solution though.

Right now, the checkpatch spelling code uses word
boundaries that include an underscore.

checkpatch spelling tests are done on 4 segments of
a #define like "PREFIX_PREFERED_SEG_ABC" finding the
misspelling of PREFERED.

Some sifting of the dictionary is still necessary to
eliminate some common prefixes to avoid too many false
positives.

For example, "ths" was dropped because it's a prefix
used by several modules even though it's a somewhat
frequent typo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ