lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160812224950.GA21040@visitor2.iram.es>
Date:	Sat, 13 Aug 2016 00:49:50 +0200
From:	Gabriel Paubert <paubert@...m.es>
To:	Segher Boessenkool <segher@...nel.crashing.org>
Cc:	Christophe Leroy <christophe.leroy@....fr>,
	linux-kernel@...r.kernel.org, Scott Wood <oss@...error.net>,
	Paul Mackerras <paulus@...ba.org>,
	linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2

On Thu, Aug 11, 2016 at 05:11:19PM -0500, Segher Boessenkool wrote:
> On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> > On the other hand gcc did at the time a very poor job (quite an
> > understatement) at bswapdi when compiling for 64 bit processors 
> > (see the example).
> > 
> > But what do modern compilers generate for bswapdi these days? Do they
> > still call the library or not?
> 
> Nope.

Great, could then these functions be removed from misc_32.S, or are
compilers that use libcalls still supported for kernel builds?

> 
> > After all, bswapdi on 32 bit processors only takes 6 instructions if the
> > input and output registers don't overlap.
> 
> For this testcase:
> ===
> typedef unsigned long long u64;
> u64 bs(u64 x) { return __builtin_bswap64(x); }
> ===
> 
> we get with -m32:
> ===
> bs:
> 	mr 9,3
> 	rotlwi 3,4,24
> 	rlwimi 3,4,8,8,15
> 	rlwimi 3,4,8,24,31
> 	rotlwi 4,9,24
> 	rlwimi 4,9,8,8,15
> 	rlwimi 4,9,8,24,31
> 	blr

In this case the compiler is constrained by the fact that the input and
ouput registers are the same. When inlined with other things it can
probably perform better scheduling and interleaving of operations.


> ===
> 
> and with -m64:
> ===
> .L.bs:
> 	srdi 10,3,32
> 	mr 9,3
> 	rotlwi 3,3,24
> 	rotlwi 8,10,24
> 	rlwimi 3,9,8,8,15
> 	rlwimi 8,10,8,8,15
> 	rlwimi 3,9,8,24,31
> 	rlwimi 8,10,8,24,31
> 	sldi 3,3,32
> 	or 3,3,8
> 	blr
> ===
> 

As demonstrated here where the two halves of the 64 bit quantity
are byte swapped in an interleaved fashion. Not perfect (I think
that with proper ordering the last 2 instructions could be replaced
by a rldimi), but reasonable.

> Neither as tight as possible, but neither horrible either.
> 

Indeed.

    Gabriel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ