[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160811213437.GA18560@visitor2.iram.es>
Date: Thu, 11 Aug 2016 23:34:37 +0200
From: Gabriel Paubert <paubert@...m.es>
To: Christophe Leroy <christophe.leroy@....fr>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
Scott Wood <oss@...error.net>, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote:
>
>
> Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> >>Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
> >>---
> >> arch/powerpc/kernel/misc_32.S | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> >>index e025230..e18055c 100644
> >>--- a/arch/powerpc/kernel/misc_32.S
> >>+++ b/arch/powerpc/kernel/misc_32.S
> >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
> >> rlwimi r9,r4,24,0,7
> >> rlwimi r10,r3,24,0,7
> >> rlwimi r9,r4,24,16,23
> >>- rlwimi r10,r3,24,16,23
> >>+ rlwimi r4,r3,24,16,23
> >> mr r3,r9
> >>- mr r4,r10
> >> blr
> >>
> >
> >Hmmm, are you sure that it works? rlwimi is a bit special since the
> >first operand is both an input and an output of the instruction.
> >
> >
>
> Oops, you are right ...
I just found this:
http://hardwarebug.org/2010/01/14/beware-the-builtins/
the bswapdi2 suggested sequence only needs a single mr instruction, the
other one is absorbed in a rotlwi.
The scheduling looks poor, but it seems impossible to interleave the
operations between the two halves without adding another instructions,
and the routine is 8 instructions long, which happens to be exactly a
cache line on most 32 bit processors.
On the other hand gcc did at the time a very poor job (quite an
understatement) at bswapdi when compiling for 64 bit processors
(see the example).
But what do modern compilers generate for bswapdi these days? Do they
still call the library or not?
After all, bswapdi on 32 bit processors only takes 6 instructions if the
input and output registers don't overlap.
Gabriel
Powered by blists - more mailing lists