lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db86e9fa88754d59ac5f8d3f4fe0f9a3@AcuMS.aculab.com>
Date:   Fri, 24 Apr 2020 09:41:30 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Robin Murphy' <robin.murphy@....com>,
        Will Deacon <will@...nel.org>,
        "Mark Rutland" <mark.rutland@....com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "kernel-team@...roid.com" <kernel-team@...roid.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Segher Boessenkool <segher@...nel.crashing.org>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Luc Van Oostenryck <luc.vanoostenryck@...il.com>,
        Arnd Bergmann <arnd@...db.de>,
        Peter Oberparleiter <oberpar@...ux.ibm.com>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: RE: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()

From: Robin Murphy
> Sent: 22 April 2020 12:02
..
> Sure - I have a nagging feeling that it could still do better WRT
> pipelining the loads anyway, so I'm happy to come back and reconsider
> the local codegen later. It certainly doesn't deserve to stand in the
> way of cross-arch rework.

How fast does that loop actually run?
To my mind it seems to do a lot of operations on each 64bit value.
I'd have thought that a loop based on:
	sum64 = *ptr;
	sum64_high = *ptr++ >> 32;
and then fixing up the result would be faster.

The x86-64 code is also bad!
On intel cpu prior to haswell a simple:
	sum_64 += *ptr32++;
is faster than the current code.
(Although you can do a lot better even on ivy bridge.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ