lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4dbf7f8d095b46a8a45e285d0ec8f8b0@AcuMS.aculab.com>
Date:   Fri, 26 Nov 2021 17:18:34 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Eric Dumazet' <edumazet@...gle.com>,
        Noah Goldstein <goldstein.w.n@...il.com>
CC:     Johannes Berg <johannes@...solutions.net>,
        "alexanderduyck@...com" <alexanderduyck@...com>,
        "kbuild-all@...ts.01.org" <kbuild-all@...ts.01.org>,
        open list <linux-kernel@...r.kernel.org>,
        "linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
        "lkp@...el.com" <lkp@...el.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        X86 ML <x86@...nel.org>
Subject: RE: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12:
 error: implicit declaration of function 'load_unaligned_zeropad'

From: Eric Dumazet
> Sent: 25 November 2021 04:01
...
> > The outputs seem to match if `buff` is aligned to 64-bit. Still see
> > difference with `csum_fold(csum_partial())` if `buff` is not 64-bit aligned.
> >
> > The comment at the top says it's "best" to have `buff` 64-bit aligned but
> > the code logic seems meant to support the misaligned case so not
> > sure if it's an issue.
> >
> 
> It is an issue in general, not in standard cases because network
> headers are aligned.
> 
> I think it came when I folded csum_partial() and do_csum(), I forgot
> to ror() the seed.
> 
> I suspect the following would help:
> 
> diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..ee7b0e7a6055bcbef42d22f7e1d8f52ddbd6be6d
> 100644
> --- a/arch/x86/lib/csum-partial_64.c
> +++ b/arch/x86/lib/csum-partial_64.c
> @@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
>         if (unlikely(odd)) {
>                 if (unlikely(len == 0))
>                         return sum;
> +               temp64 = ror32((__force u64)sum, 8);
>                 temp64 += (*(unsigned char *)buff << 8);
>                 len--;
>                 buff++;

You can save an instruction (as if this path matters) by:
			temp64 = sum + *(unsigned char *)buff;
			temp64 <<= 8;
Although that probably falls foul of 64bit shifts being slow.
So maybe just:
			sum += *(unsigned char *)buff;
			temp64 = bswap32(sum);
AFAICT (from a pdf) bswap32() and ror(x, 8) are likely to be
the same speed but may use different execution units.

Intel seem so have managed to slow down ror(x, %cl) to 3 clocks
in sandy bridge - and still not fixed it.
Although the compiler might be making a pigs-breakfast of the
register allocation when you tried setting 'odd = 8'.

Weeks can be spent fiddling with this code :-(

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ