linux-kernel - Re: [patch V2] lib: GCD: add binary GCD algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160427162033.11271.qmail@ns.horizon.com>
Date:	27 Apr 2016 12:20:33 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	akpm@...ux-foundation.org, linux@...izon.com, peterz@...radead.org,
	zengzhaoxiu@....com
Cc:	linux-kernel@...r.kernel.org, zhaoxiu.zeng@...il.com
Subject: Re: [patch V2] lib: GCD: add binary GCD algorithm

I replicated your results in 32- and 64-bit x86:

- If __ffs (gcc's __builtin_ctz) is available, the basic
  (non-even/odd) binary GCD algorithm is faster than the
  divsion-based or even/odd.
- If shifting down has to be done in a loop, the even/odd
  binary algorithm is fastest.

I tried a few variants, but couldn't consistently beat your code.
The following variant, with the loop test at the end, seemed to do better
in some cases, but it's not consistent enought to be sure:

static unsigned long gcd5(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	b >>= __builtin_ctzl(b);

	do {
		a >>= __builtin_ctzl(a);
		if (a < b)
			swap(a, b);
		a -= b;
	} while (a);
	return b << __builtin_ctzl(r);
}


The frequent "if (USE_FFS)" conditions seem to clutter the code.
Would it be easier to read if the two variants were just separated?
The function is short enough that the duplication isn't too severe.

Such as the following.

A few source cleanups (included for example; delete if you don't
like them):
- Return directly from the loop, rather than using break().
- Use "r &= -r" mostly because it's clearer.
Signed-off-by: George Spelvin <linux@...izon.com>

/*
 * This implements the binary GCD algorithm. (Often attributed to Stein,
 * but as Knuth has noted, appears a first-century Chinese math text.)
 *
 * This is faster than the division-based algorithm even on x86, which
 * has decent hardware division.
 */
#if USE_FFS
/* If __ffs is available, the even/odd algorithm benchmarks slower. */
unsigned long gcd(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	b >>= __ffs(b);

	for (;;) {
		a >>= __ffs(a);
		if (a == b)
			return a << __ffs(r);
		if (a < b)
			swap(a, b);
		a -= b;
	}
}

#else /* !USE_FFS */

/* If normalization is done by loops, the even/odd algorithm is a win. */
unsigned long gcd(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	r &= -r;	/* Isolate lsbit of r */

	while (!(b & r))
		b >>= 1;

	for (;;) {
		while (!(a & r))
			a >>= 1;
		if (a == b)
			return a;
		if (a < b)
			swap(a, b);
		a -= b;
		a >>= 1;
		if (a & r)
			a += b;
		a >>= 1;
	}
}
#endif