lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 7 Jan 2024 12:11:18 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Eric Dumazet' <edumazet@...gle.com>, Linus Torvalds
	<torvalds@...ux-foundation.org>
CC: Noah Goldstein <goldstein.w.n@...il.com>, kernel test robot
	<lkp@...el.com>, "x86@...nel.org" <x86@...nel.org>,
	"oe-kbuild-all@...ts.linux.dev" <oe-kbuild-all@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com"
	<mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "hpa@...or.com"
	<hpa@...or.com>
Subject: RE: x86/csum: Remove unnecessary odd handling

From: Eric Dumazet
> Sent: 06 January 2024 10:26
...
> On a related note, at least with clang, I found that csum_ipv6_magic()
> is needlessly using temporary on-stack storage,
> showing a stall on Cascade Lake unless I am patching add32_with_carry() :
> 
> diff --git a/arch/x86/include/asm/checksum_64.h
> b/arch/x86/include/asm/checksum_64.h
> index 407beebadaf45a748f91a36b78bd1d023449b132..c3d6f47626c70d81f0c2ba401d85050b09a39922
> 100644
> --- a/arch/x86/include/asm/checksum_64.h
> +++ b/arch/x86/include/asm/checksum_64.h
> @@ -171,7 +171,7 @@ static inline unsigned add32_with_carry(unsigned
> a, unsigned b)
>         asm("addl %2,%0\n\t"
>             "adcl $0,%0"
>             : "=r" (a)
> -           : "0" (a), "rm" (b));
> +           : "0" (a), "r" (b));
>         return a;
>  }

Try replacing:
	return csum_fold(
	       (__force __wsum)add32_with_carry(sum64 & 0xffffffff, sum64>>32));
with:
	return csum_fold((__force __wsum)((sum64 + ror64(sum64, 32)) >> 32));

Should be less instructions as well.
(shift, add, shift v shift, mov, and, add, add)
Although both might be 3 clocks.

The best C version of csum_fold (from IIRC arc) is also likely to be
better than the x86 asm one - certainly no worse.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ