lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	7 May 2016 18:30:18 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	sam@...nborg.org, zengzhaoxiu@....com
Cc:	akpm@...ux-foundation.org, linux@...izon.com,
	linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org
Subject: Re: [patch V4] lib: GCD: Use binary GCD algorithm instead of Euclidean

Sam Ravnborg wrote:
> sparc64 have an efficient ffs implementation.
> We use run-time patching to use the proper version
> depending on the actual sparc cpu.
> 
> As this is determinded at config time, then let the
> sparc cpu that has the efficient ffs benefit from this.
> 
> In other words - select CPU_NO_EFFICIENT_FFS only for SPARC32.

I'm not sure this is the right thing.

It's always a function call, and there's boot-time code patching to use
either an unrolled binary search or a POPC instructon on processors that
have that instruction.

The NO_EFFICIENT_FFS isn't much slower than the __ffs version, so the
call/return alone might eat the difference, and if the CPU doesn't have
POPC support it's definitely a lose.

Quite simply, gcd isn't important enough to be worth the same boot-time
code patching, and if we have to use one on both types of CPU the the
NO_EFFICIENT_FFS path is the safer alternative in case of uncertainty.

Would you be willing to try benchmarking it?  The baseline code, plus two
versions of the __ffs code using the two different __ffs implementations
(forced out of line by compiling from assembler source or using
inine asm and __attribute((noinline))).



By the way, the SPARC64 implementation could be improved.

It's currently 5 instructions:

__ffs:
        neg     %o0, %g1
        xnor    %o0, %g1, %o1
        popc    %o1, %o0
        retl
         sub    %o0, 1, %o0

That could be improved to 4:
	sub	%o0, 1, %g1
	andn	%g1, %o0, %g1
	retl
	 popc	%g1, %o0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ