linux-kernel - Re: [patch V2] lib: GCD: add binary GCD algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 27 Apr 2016 13:08:58 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	zengzhaoxiu@....com
Cc:	linux@...izon.com, peterz@...radead.org,
	linux-kernel@...r.kernel.org, Zhaoxiu Zeng <zhaoxiu.zeng@...il.com>
Subject: Re: [patch V2] lib: GCD: add binary GCD algorithm

On Wed, 27 Apr 2016 16:05:50 +0800 zengzhaoxiu@....com wrote:

> From: Zhaoxiu Zeng <zhaoxiu.zeng@...il.com>
> 
> Because some architectures (alpha, armv6, etc.) don't provide hardware division,
> the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations,
> it replaces division with arithmetic shifts, comparisons, and subtraction.
> 
> I have compiled successfully with x86_64_defconfig and i386_defconfig.
> 
> I use the following code to test:
> 
> ...
>
> Compiled with "-O2", on "VirtualBox 4.2.0-35-generic #40-Ubuntu x86_64" got:
> 
> zhaoxiuzeng@...oxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 92281
> gcd1: elapsed 55005
> gcd2: elapsed 91088
> zhaoxiuzeng@...oxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 115546
> gcd1: elapsed 55928
> gcd2: elapsed 91938
> zhaoxiuzeng@...oxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 91189
> gcd1: elapsed 55493
> gcd2: elapsed 90078
> zhaoxiuzeng@...oxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 157364
> gcd1: elapsed 55204
> gcd2: elapsed 90058
> zhaoxiuzeng@...oxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 91667
> gcd1: elapsed 54641
> gcd2: elapsed 91364

Can you please include a summary which describes the overall impact of
the patch?  eg, how faster does a%b become on <some architecture>

> --- a/lib/gcd.c
> +++ b/lib/gcd.c
> @@ -2,19 +2,82 @@
>  #include <linux/gcd.h>
>  #include <linux/export.h>
>  
> -/* Greatest common divisor */
> +/*
> + * use __ffs if the CPU has efficient __ffs
> + */
> +#if (defined(CONFIG_ALPHA) && defined(CONFIG_ALPHA_EV6) && defined(CONFIG_ALPHA_EV67)) || \
> +	defined(CONFIG_ARC) || \
> +	(defined(CONFIG_ARM) && __LINUX_ARM_ARCH__ >= 5) || defined(CONFIG_ARM64) || \
> +	defined(CONFIG_AVR32) || \
> +	defined(CONFIG_BLACKFIN) || \
> +	defined(CONFIG_C6X) || \
> +	defined(CONFIG_CRIS) || \
> +	defined(CONFIG_FRV) || \
> +	defined(CONFIG_HEXAGON) || \
> +	defined(CONFIG_IA64) || \
> +	(defined(CONFIG_M68K) && \
> +	 (!defined(CONFIG_CPU_HAS_NO_BITFIELDS) || \
> +	  ((defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
> +	   !defined(CONFIG_M68000) && !defined(CONFIG_MCPU32)))) || \
> +	defined(CONFIG_MN10300) || \
> +	defined(CONFIG_OPENRISC) || \
> +	defined(CONFIG_POWERPC) || \
> +	defined(CONFIG_S390) || \
> +	defined(CONFIG_TILE) || \
> +	defined(CONFIG_UNICORE32) || \
> +	defined(CONFIG_X86) || \
> +	defined(CONFIG_XTENSA)
> +# define USE_FFS 1
> +#elif defined(CONFIG_MIPS)
> +# define USE_FFS (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz)
> +#else
> +# define USE_FFS 0
> +#endif

Yikes.

Normally we'd create a new Kconfig variable and do this in the
individual arch/XXX/Kconfig files.

Can we just assume CONFIG_ARCH_HAS_SLOW_FFS is false and then set it to
true for MIPS, arm, etc?

> +/*
> + * This implements the binary GCD algorithm. (Often attributed to Stein,
> + * but as Knith has noted, appears a first-century Chinese math text.)
> + */

Knith might be Knuth?

>  unsigned long gcd(unsigned long a, unsigned long b)
>  {
> -	unsigned long r;
> +	unsigned long r = a | b;
> +
> +	if (!a || !b)
> +		return r;