linux-kernel - Re: [PATCH] lib/int_sqrt.c: Optimize square root function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170720152749.k7al6xsvckczolzi@hirez.programming.kicks-ass.net>
Date:   Thu, 20 Jul 2017 17:27:49 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Anshul Garg <aksgarg1989@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        anshul.g@...sung.com, Thomas Gleixner <tglx@...utronix.de>,
        joe@...ches.com
Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function

On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote:
> ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=10000000 -DNEW=1 -DFLS=1 -DANSHUL=1  ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt
> 
>  Performance counter stats for './sqrt' (10 runs):
> 
>        328,415,775      cycles:u                                                      ( +-  0.15% )
>      1,138,579,704      instructions:u            #    3.47  insn per cycle           ( +-  0.00% )
> 
>        0.088703205 seconds time elapsed  

> static __always_inline unsigned long fls(unsigned long word)
> {
> 	asm("rep; bsr %1,%0"
> 		: "=r" (word)
> 		: "rm" (word));
> 	return BITS_PER_LONG - 1 - word;
> }

That is actually "lzcnt", if I used the regular fls implementation:


static __always_inline unsigned long __fls(unsigned long word)
{
	asm("bsr %1,%0"
		: "=r" (word)
		: "rm" (word));
	return word;
}

It ends up slightly more expensive:

~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=10000000 -DNEW=1 -DFLS=1 -DANSHUL=1  ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt

 Performance counter stats for './sqrt' (10 runs):

       384,842,215      cycles:u                                                      ( +-  0.08% )
     1,118,579,712      instructions:u            #    2.91  insn per cycle           ( +-  0.00% )

       0.103018001 seconds time elapsed  


Still loads cheaper than pretty much any other combination.