lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	27 Apr 2016 12:20:33 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	akpm@...ux-foundation.org, linux@...izon.com, peterz@...radead.org,
	zengzhaoxiu@....com
Cc:	linux-kernel@...r.kernel.org, zhaoxiu.zeng@...il.com
Subject: Re: [patch V2] lib: GCD: add binary GCD algorithm

I replicated your results in 32- and 64-bit x86:

- If __ffs (gcc's __builtin_ctz) is available, the basic
  (non-even/odd) binary GCD algorithm is faster than the
  divsion-based or even/odd.
- If shifting down has to be done in a loop, the even/odd
  binary algorithm is fastest.

I tried a few variants, but couldn't consistently beat your code.
The following variant, with the loop test at the end, seemed to do better
in some cases, but it's not consistent enought to be sure:

static unsigned long gcd5(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	b >>= __builtin_ctzl(b);

	do {
		a >>= __builtin_ctzl(a);
		if (a < b)
			swap(a, b);
		a -= b;
	} while (a);
	return b << __builtin_ctzl(r);
}


The frequent "if (USE_FFS)" conditions seem to clutter the code.
Would it be easier to read if the two variants were just separated?
The function is short enough that the duplication isn't too severe.

Such as the following.

A few source cleanups (included for example; delete if you don't
like them):
- Return directly from the loop, rather than using break().
- Use "r &= -r" mostly because it's clearer.
Signed-off-by: George Spelvin <linux@...izon.com>

/*
 * This implements the binary GCD algorithm. (Often attributed to Stein,
 * but as Knuth has noted, appears a first-century Chinese math text.)
 *
 * This is faster than the division-based algorithm even on x86, which
 * has decent hardware division.
 */
#if USE_FFS
/* If __ffs is available, the even/odd algorithm benchmarks slower. */
unsigned long gcd(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	b >>= __ffs(b);

	for (;;) {
		a >>= __ffs(a);
		if (a == b)
			return a << __ffs(r);
		if (a < b)
			swap(a, b);
		a -= b;
	}
}

#else /* !USE_FFS */

/* If normalization is done by loops, the even/odd algorithm is a win. */
unsigned long gcd(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	r &= -r;	/* Isolate lsbit of r */

	while (!(b & r))
		b >>= 1;

	for (;;) {
		while (!(a & r))
			a >>= 1;
		if (a == b)
			return a;
		if (a < b)
			swap(a, b);
		a -= b;
		a >>= 1;
		if (a & r)
			a += b;
		a >>= 1;
	}
}
#endif

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ