lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e505c35-8428-89bb-7f9b-bc819382c3cd@infradead.org>
Date:   Sun, 26 Jul 2020 12:08:08 -0700
From:   Randy Dunlap <rdunlap@...radead.org>
To:     Joe Perches <joe@...ches.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>
Cc:     linuxppc-dev@...ts.ozlabs.org, Paul Mackerras <paulus@...ba.org>,
        linux-kernel@...r.kernel.org, Michael Ellerman <mpe@...erman.id.au>
Subject: Re: [PATCH 0/9] powerpc: delete duplicated words

On 7/26/20 10:49 AM, Joe Perches wrote:
> On Sun, 2020-07-26 at 10:23 -0700, Randy Dunlap wrote:
>> On 7/26/20 7:29 AM, Christophe Leroy wrote:
>>> Randy Dunlap <rdunlap@...radead.org> a écrit :
>>>
>>>> Drop duplicated words in arch/powerpc/ header files.
>>>
>>> How did you detect them ? Do you have some script for tgat, or you just read all comments ?
>>
>> Yes, it's a script that finds lots of false positives, so I have to check
>> each and every one of them for validity.
> 
> And it's a lot of work too. (thanks Randy)
> 
> It could be something like:
> 
> $ grep-2.5.4 -nrP --include=*.[ch] '\b([A-Z]?[a-z]{2,}\b)[ \t]*(?:\n[ \t]*\*[ \t]*|)\1\b' * | \
>   grep -vP '\b(?:struct|enum|union)\s+([A-Z]?[a-z]{2,})\s+\*?\s*\1\b' | \
>   grep -vP '\blong\s+long\b' | \
>   grep -vP '\b([A-Z]?[a-z]{2,})(?:\t+| {2,})\1\b'

Hi Joe,

(what is grep-2.5.4 ?)

It looks like you tried a few iterations of this -- since it drops things
like "long long".  There are lots of data types that are repeated & valid.
And many struct names, like "struct kref kref", "struct completion completion",
and "struct mutex mutex".  I handle (ignore) those manually, although that
could be added to the Perl script.

v0.1 of this script also found lots of repeated numbers and strings of
special characters (ASCII art etc.), so now it ignores duplicated numbers
or special characters -- since it is really looking for duplicate words.

Anyway, I might as well attach it. It's no big deal.
And if someone else wants to tackle using it, go for it.

-- 
~Randy


Download attachment "find_dup_words.pl" of type "application/x-perl" (2959 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ