[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db86e9fa88754d59ac5f8d3f4fe0f9a3@AcuMS.aculab.com>
Date: Fri, 24 Apr 2020 09:41:30 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Robin Murphy' <robin.murphy@....com>,
Will Deacon <will@...nel.org>,
"Mark Rutland" <mark.rutland@....com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"kernel-team@...roid.com" <kernel-team@...roid.com>,
Michael Ellerman <mpe@...erman.id.au>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Segher Boessenkool <segher@...nel.crashing.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
Luc Van Oostenryck <luc.vanoostenryck@...il.com>,
Arnd Bergmann <arnd@...db.de>,
Peter Oberparleiter <oberpar@...ux.ibm.com>,
Masahiro Yamada <masahiroy@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>
Subject: RE: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
From: Robin Murphy
> Sent: 22 April 2020 12:02
..
> Sure - I have a nagging feeling that it could still do better WRT
> pipelining the loads anyway, so I'm happy to come back and reconsider
> the local codegen later. It certainly doesn't deserve to stand in the
> way of cross-arch rework.
How fast does that loop actually run?
To my mind it seems to do a lot of operations on each 64bit value.
I'd have thought that a loop based on:
sum64 = *ptr;
sum64_high = *ptr++ >> 32;
and then fixing up the result would be faster.
The x86-64 code is also bad!
On intel cpu prior to haswell a simple:
sum_64 += *ptr32++;
is faster than the current code.
(Although you can do a lot better even on ivy bridge.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists