lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <04c41a96f4eb4fe782d10ae2691ad93e@AcuMS.aculab.com>
Date:   Thu, 6 Jan 2022 16:19:54 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Eric Dumazet' <edumazet@...gle.com>,
        'Peter Zijlstra' <peterz@...radead.org>
CC:     "'tglx@...utronix.de'" <tglx@...utronix.de>,
        "'mingo@...hat.com'" <mingo@...hat.com>,
        'Borislav Petkov' <bp@...en8.de>,
        "'dave.hansen@...ux.intel.com'" <dave.hansen@...ux.intel.com>,
        'X86 ML' <x86@...nel.org>, "'hpa@...or.com'" <hpa@...or.com>,
        "'alexanderduyck@...com'" <alexanderduyck@...com>,
        'open list' <linux-kernel@...r.kernel.org>,
        'netdev' <netdev@...r.kernel.org>,
        "'Noah Goldstein'" <goldstein.w.n@...il.com>
Subject: [PATCH ] x86/lib: Optimise copy loop for long buffers in
 csum-partial_64.c

gcc converts the loop into one that only increments the pointer
but makes a mess of calculating the limit and gcc 9.1+ completely
refuses to use the final value of 'buff' from the last iteration.

Explicitly code a pointer comparison and don't bother changing len.

Signed-off-by: David Laight <david.laight@...lab.com>
---

The asm("" : "+r" (buff)); forces gcc to use the loop-updated
value of 'buff' and removes at least 6 instructions.

The gcc folk really ought to look at why gcc 9.1 onwards is so
much worse that gcc 8.
See https://godbolt.org/z/T39PcnvfE


 arch/x86/lib/csum-partial_64.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index edd3e579c2a7..342de5f24fcb 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -27,21 +27,24 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 	u64 temp64 = (__force u64)sum;
 	unsigned result;
 
-	while (unlikely(len >= 64)) {
-		asm("addq 0*8(%[src]),%[res]\n\t"
-		    "adcq 1*8(%[src]),%[res]\n\t"
-		    "adcq 2*8(%[src]),%[res]\n\t"
-		    "adcq 3*8(%[src]),%[res]\n\t"
-		    "adcq 4*8(%[src]),%[res]\n\t"
-		    "adcq 5*8(%[src]),%[res]\n\t"
-		    "adcq 6*8(%[src]),%[res]\n\t"
-		    "adcq 7*8(%[src]),%[res]\n\t"
-		    "adcq $0,%[res]"
-		    : [res] "+r" (temp64)
-		    : [src] "r" (buff)
-		    : "memory");
-		buff += 64;
-		len -= 64;
+	if (unlikely(len >= 64)) {
+		const void *lim = buff + (len & ~63u);
+		do {
+			asm("addq 0*8(%[src]),%[res]\n\t"
+			    "adcq 1*8(%[src]),%[res]\n\t"
+			    "adcq 2*8(%[src]),%[res]\n\t"
+			    "adcq 3*8(%[src]),%[res]\n\t"
+			    "adcq 4*8(%[src]),%[res]\n\t"
+			    "adcq 5*8(%[src]),%[res]\n\t"
+			    "adcq 6*8(%[src]),%[res]\n\t"
+			    "adcq 7*8(%[src]),%[res]\n\t"
+			    "adcq $0,%[res]"
+			    : [res] "+r" (temp64)
+			    : [src] "r" (buff)
+			    : "memory");
+			asm("" : "+r" (buff));
+			buff += 64;
+		} while (buff < lim);
 	}
 
 	if (len & 32) {
-- 
2.17.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ