lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 10 Jan 2022 11:49:05 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     David Laight <David.Laight@...LAB.COM>,
        'Eric Dumazet' <edumazet@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>
CC:     "'tglx@...utronix.de'" <tglx@...utronix.de>,
        "'mingo@...hat.com'" <mingo@...hat.com>,
        'Borislav Petkov' <bp@...en8.de>,
        "'dave.hansen@...ux.intel.com'" <dave.hansen@...ux.intel.com>,
        'X86 ML' <x86@...nel.org>, "'hpa@...or.com'" <hpa@...or.com>,
        "'alexanderduyck@...com'" <alexanderduyck@...com>,
        'open list' <linux-kernel@...r.kernel.org>,
        'netdev' <netdev@...r.kernel.org>,
        "'Noah Goldstein'" <goldstein.w.n@...il.com>
Subject: RE: [PATCH v2] x86/lib: Remove the special case for odd-aligned
 buffers in csum-partial_64.c

From: David Laight
> Sent: 06 January 2022 14:46
> 
> There is no need to special case the very unusual odd-aligned buffers.
> They are no worse than 4n+2 aligned buffers.
> 
> Signed-off-by: David Laight <david.laight@...lab.com>
> Acked-by: Eric Dumazet
> ---

Ping...
This (and my two other patches for the same file) are improvements
to Eric's rewrite of this code that is going into 5.17.
It would be nice to get these in as well.
They are likely to be measurable (if minor) performance improvements
for common cases.

	David

> 
> resend - v1 seems to have got lost :-)
> 
> v2: Also delete from32to16()
>     Add acked-by from Eric (he sent one at some point)
>     Fix possible whitespace error in the last hunk.
> 
> The penalty for any misaligned access seems to be minimal.
> On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte
>   checksum.
> That is less than 1 clock for each cache line!
> That is just measuring the main loop with an lfence prior to rdpmc to
> read PERF_COUNT_HW_CPU_CYCLES.
> 
>  arch/x86/lib/csum-partial_64.c | 28 ++--------------------------
>  1 file changed, 2 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> index 1f8a8f895173..061b1ed74d6a 100644
> --- a/arch/x86/lib/csum-partial_64.c
> +++ b/arch/x86/lib/csum-partial_64.c
> @@ -11,16 +11,6 @@
>  #include <asm/checksum.h>
>  #include <asm/word-at-a-time.h>
> 
> -static inline unsigned short from32to16(unsigned a)
> -{
> -	unsigned short b = a >> 16;
> -	asm("addw %w2,%w0\n\t"
> -	    "adcw $0,%w0\n"
> -	    : "=r" (b)
> -	    : "0" (b), "r" (a));
> -	return b;
> -}
> -
>  /*
>   * Do a checksum on an arbitrary memory area.
>   * Returns a 32bit checksum.
> @@ -30,22 +20,12 @@ static inline unsigned short from32to16(unsigned a)
>   *
>   * Still, with CHECKSUM_COMPLETE this is called to compute
>   * checksums on IPv6 headers (40 bytes) and other small parts.
> - * it's best to have buff aligned on a 64-bit boundary
> + * The penalty for misaligned buff is negligable.
>   */
>  __wsum csum_partial(const void *buff, int len, __wsum sum)
>  {
>  	u64 temp64 = (__force u64)sum;
> -	unsigned odd, result;
> -
> -	odd = 1 & (unsigned long) buff;
> -	if (unlikely(odd)) {
> -		if (unlikely(len == 0))
> -			return sum;
> -		temp64 = ror32((__force u32)sum, 8);
> -		temp64 += (*(unsigned char *)buff << 8);
> -		len--;
> -		buff++;
> -	}
> +	unsigned result;
> 
>  	while (unlikely(len >= 64)) {
>  		asm("addq 0*8(%[src]),%[res]\n\t"
> @@ -130,10 +110,6 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
>  #endif
>  	}
>  	result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
> -	if (unlikely(odd)) {
> -		result = from32to16(result);
> -		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
> -	}
>  	return (__force __wsum)result;
>  }
>  EXPORT_SYMBOL(csum_partial);
> --
> 2.17.1
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ