[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a668562bf3cb410e90484eb0d4a436a7@AcuMS.aculab.com>
Date: Mon, 3 Dec 2018 10:10:05 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Vineet Gupta' <vineet.gupta1@...opsys.com>,
'Arnd Bergmann' <arnd@...db.de>,
"jose.abreu@...opsys.com" <jose.abreu@...opsys.com>
CC: "open list:SYNOPSYS ARC ARCHITECTURE"
<linux-snps-arc@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"alexey.brodkin@...opsys.com" <alexey.brodkin@...opsys.com>,
Joao Pinto <joao.pinto@...opsys.com>,
"Vitor Soares" <vitor.soares@...opsys.com>
Subject: RE: [PATCH v2] ARC: io.h: Implement reads{x}()/writes{x}()
From: Vineet Gupta
...
> > It also seems to have used a different type of loop to the
> > other example, probably less efficient.
> > (Not that I'm an expert on ARC opcodes.)
>
> The difference is due to ISA and ensuing ARC gcc backends. ARCompact based cores
> don't support unaligned access and the loop there was ZOL (Zero delay loop). In
> ARCv2 based cores, the gcc backend has been tweaked to generate fewer ZOLs hence
> you see the more canonical tst and branch style loop.
Is this another case of the hardware implementing 'hardware' loop
instructions that execute slower than ones made of simple instructions?
The worst example has to be the x86 'loop' (dec cx and jump nz)
instruction which is microcoded on intel cpus.
That makes it very difficult to use the new addx instruction to
get two dependency chains through a loop.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists