lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160811213437.GA18560@visitor2.iram.es>
Date:	Thu, 11 Aug 2016 23:34:37 +0200
From:	Gabriel Paubert <paubert@...m.es>
To:	Christophe Leroy <christophe.leroy@....fr>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Scott Wood <oss@...error.net>, linuxppc-dev@...ts.ozlabs.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2

On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote:
> 
> 
> Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> >>Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
> >>---
> >> arch/powerpc/kernel/misc_32.S | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> >>index e025230..e18055c 100644
> >>--- a/arch/powerpc/kernel/misc_32.S
> >>+++ b/arch/powerpc/kernel/misc_32.S
> >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
> >> 	rlwimi  r9,r4,24,0,7
> >> 	rlwimi  r10,r3,24,0,7
> >> 	rlwimi  r9,r4,24,16,23
> >>-	rlwimi  r10,r3,24,16,23
> >>+	rlwimi  r4,r3,24,16,23
> >> 	mr      r3,r9
> >>-	mr      r4,r10
> >> 	blr
> >>
> >
> >Hmmm, are you sure that it works? rlwimi is a bit special since the
> >first operand is both an input and an output of the instruction.
> >
> >
> 
> Oops, you are right ...

I just found this: 

http://hardwarebug.org/2010/01/14/beware-the-builtins/

the bswapdi2 suggested sequence only needs a single mr instruction, the 
other one is absorbed in a rotlwi.

The scheduling looks poor, but it seems impossible to interleave the
operations between the two halves without adding another instructions,
and the routine is 8 instructions long, which happens to be exactly a
cache line on most 32 bit processors.

On the other hand gcc did at the time a very poor job (quite an
understatement) at bswapdi when compiling for 64 bit processors 
(see the example).

But what do modern compilers generate for bswapdi these days? Do they
still call the library or not?

After all, bswapdi on 32 bit processors only takes 6 instructions if the
input and output registers don't overlap.

    Gabriel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ