lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170720152749.k7al6xsvckczolzi@hirez.programming.kicks-ass.net>
Date:   Thu, 20 Jul 2017 17:27:49 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Anshul Garg <aksgarg1989@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        anshul.g@...sung.com, Thomas Gleixner <tglx@...utronix.de>,
        joe@...ches.com
Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function

On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote:
> ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=10000000 -DNEW=1 -DFLS=1 -DANSHUL=1  ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt
> 
>  Performance counter stats for './sqrt' (10 runs):
> 
>        328,415,775      cycles:u                                                      ( +-  0.15% )
>      1,138,579,704      instructions:u            #    3.47  insn per cycle           ( +-  0.00% )
> 
>        0.088703205 seconds time elapsed  

> static __always_inline unsigned long fls(unsigned long word)
> {
> 	asm("rep; bsr %1,%0"
> 		: "=r" (word)
> 		: "rm" (word));
> 	return BITS_PER_LONG - 1 - word;
> }

That is actually "lzcnt", if I used the regular fls implementation:


static __always_inline unsigned long __fls(unsigned long word)
{
	asm("bsr %1,%0"
		: "=r" (word)
		: "rm" (word));
	return word;
}

It ends up slightly more expensive:

~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=10000000 -DNEW=1 -DFLS=1 -DANSHUL=1  ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt

 Performance counter stats for './sqrt' (10 runs):

       384,842,215      cycles:u                                                      ( +-  0.08% )
     1,118,579,712      instructions:u            #    2.91  insn per cycle           ( +-  0.00% )

       0.103018001 seconds time elapsed  


Still loads cheaper than pretty much any other combination.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ