lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <042bae7f-51d6-93e8-549d-a38bcb4c9cf5@163.com>
Date:	Sun, 8 May 2016 20:52:52 +0800
From:	Zhaoxiu Zeng <zengzhaoxiu@....com>
To:	George Spelvin <linux@...izon.com>, akpm@...ux-foundation.org,
	dalias@...c.org, davem@...emloft.net, geert@...ux-m68k.org,
	James.Bottomley@...senPartnership.com, jjuran@...il.com,
	peterz@...radead.org, sam@...nborg.org
Cc:	deller@....de, heiko.carstens@...ibm.com, ink@...assic.park.msu.ru,
	james.hogan@...tec.com, jejb@...isc-linux.org, jonas@...thpole.se,
	lennox.wu@...il.com, lftan@...era.com, linux-alpha@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-m68k@...ts.linux-m68k.org, linux-metag@...r.kernel.org,
	linux-mips@...ux-mips.org, linux-parisc@...r.kernel.org,
	linux-s390@...r.kernel.org, linux-sh@...r.kernel.org,
	linux-snps-arc@...ts.infradead.org, linux@....linux.org.uk,
	linux@...ts.openrisc.net, liqin.linux@...il.com,
	mattst88@...il.com, monstr@...str.eu,
	nios2-dev@...ts.rocketboards.org, ralf@...ux-mips.org,
	rth@...ddle.net, schwidefsky@...ibm.com,
	sparclinux@...r.kernel.org, uclinux-h8-devel@...ts.sourceforge.jp,
	vgupta@...opsys.com, ysato@...rs.sourceforge.jp,
	zhaoxiu.zeng@...il.com
Subject: Re: [patch V4] lib: GCD: Use binary GCD algorithm instead of
 Euclidean

在 2016/5/7 16:41, George Spelvin 写道:
> Nothing critical, but a bit of kibitzing.
> (That is slang in the Yiddish language for a person
> who offers annoying and unwanted advice.)
>
>> The binary GCD algorithm is based on the following facts:
>> 	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
>> 	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
>> 	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
> 1) "even" and "odd" are adjectives.  In English, adjectives to not have
>    plural suffixes.  Thus, "they are even" or "they are odd".
> 2) Although "all" is not exactly wrong, it sounds odd.  Since there are
>    exactly two of them it's clearer to say "both".
>
> If I also rephrase the last line to fit into 80 columns, you get:
>
>   The binary GCD algorithm is based on the following facts:
> - 	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
> + 	1. If a and b are both even, then gcd(a,b) = 2 * gcd(a/2, b/2)
>  	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
> -	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
> +	3. If both are odd, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
>
> 3) Negative config options are always confusing.
>
> Would it be better to call it CONFIG_INEFFICIEBNT_FFS, or even simpler
> CONFIG_SLOW_FFS?
>
> Also, you're allowed to add "help" to a non-interactive config option,
> and some documentation might be useful.
> E.g.
>
> +config CPU_SLOW_FFS
> +	def_bool n
> +	help
> +	  If n, the CPU supports a fast __ffs (__builtin_ctz) operation,
> +	  either directly or via a short code sequence using a count
> +	  leading zeros or population count instruction.  If y, the
> +	  operation is emulated by slower software, such as an unrolled
> +	  binary search.
> +
> +	  This is purely an optimization option; the kernel
> +	  will function correctly regardless of how it is set.
> +

Thanks a lot.

> Your benchmark code doesn't have to have a separate code path if
> __x86_64__; rdtsc works on 32-bit code just as well.  paths.  And the
> way you subtract the end and start times is unnecessarily complicated.
> The C language guarantees that unsigned arithmetic simply wraps modulo
> 2^bits as expected.

rdsc works on x86, the other path is prepared for other architectures.

> Here are some more timings, with the same flags as your tests:
>
> First, 32 bit code:
> 		gcd0	gcd1	gcd2	gcd3	gcd4
> Ivy Bridge	3156	1192	1740	1160	1640	PASS
> AMD Phenom	7150	2564	2348	2975	2843	PASS
> Core 2		4176	2592	4164	2604	3900	PASS
> Pentium 4	11492	4784	7632	4852	6452	PASS
>
> And 64-bit (longer times becuase the inputs are larger):
> Ivy Bridge	10636	2496	3500	2432	3360	PASS
> AMD Phenom	19482	4058	6030	5001	6845	PASS
>
> Looking at those, I'm not sure how much better the gcd3/4 versions are
> than gcd1/2.  The difference seems pretty minor and sometimes negative.
>

The worst case of binary GCD is that one number is power of 2,
for example a is 0xffffffff (0xcccccccc for the "even/odd" variant) and b is 1.

The gcd3/4 versions can handle this properly.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ