linux-kernel - Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJubuJxjVp4fx78-bjKBN3e9JsdAwZxj4XO6g2_7ZPqJQ@mail.gmail.com>
Date:   Fri, 26 Nov 2021 10:09:45 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Noah Goldstein <goldstein.w.n@...il.com>,
        Johannes Berg <johannes@...solutions.net>,
        "alexanderduyck@...com" <alexanderduyck@...com>,
        "kbuild-all@...ts.01.org" <kbuild-all@...ts.01.org>,
        open list <linux-kernel@...r.kernel.org>,
        "linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
        "lkp@...el.com" <lkp@...el.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        X86 ML <x86@...nel.org>
Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12:
 error: implicit declaration of function 'load_unaligned_zeropad'

On Fri, Nov 26, 2021 at 9:18 AM David Laight <David.Laight@...lab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 25 November 2021 04:01
> ...
> > > The outputs seem to match if `buff` is aligned to 64-bit. Still see
> > > difference with `csum_fold(csum_partial())` if `buff` is not 64-bit aligned.
> > >
> > > The comment at the top says it's "best" to have `buff` 64-bit aligned but
> > > the code logic seems meant to support the misaligned case so not
> > > sure if it's an issue.
> > >
> >
> > It is an issue in general, not in standard cases because network
> > headers are aligned.
> >
> > I think it came when I folded csum_partial() and do_csum(), I forgot
> > to ror() the seed.
> >
> > I suspect the following would help:
> >
> > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> > index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..ee7b0e7a6055bcbef42d22f7e1d8f52ddbd6be6d
> > 100644
> > --- a/arch/x86/lib/csum-partial_64.c
> > +++ b/arch/x86/lib/csum-partial_64.c
> > @@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
> >         if (unlikely(odd)) {
> >                 if (unlikely(len == 0))
> >                         return sum;
> > +               temp64 = ror32((__force u64)sum, 8);
> >                 temp64 += (*(unsigned char *)buff << 8);
> >                 len--;
> >                 buff++;
>
> You can save an instruction (as if this path matters) by:
>                         temp64 = sum + *(unsigned char *)buff;

This might overflow, sum is 32bit.

Given that we have temp64 = (__force u64)sum;  already done at this
point, we can instead

temp64 += *(u8 *)buff;

>                         temp64 <<= 8;



> Although that probably falls foul of 64bit shifts being slow.

Are they slower than the ror32(sum, 8) ?

> So maybe just:
>                         sum += *(unsigned char *)buff;

we might miss a carry/overflow here

>                         temp64 = bswap32(sum);


> AFAICT (from a pdf) bswap32() and ror(x, 8) are likely to be
> the same speed but may use different execution units.
>
> Intel seem so have managed to slow down ror(x, %cl) to 3 clocks
> in sandy bridge - and still not fixed it.
> Although the compiler might be making a pigs-breakfast of the
> register allocation when you tried setting 'odd = 8'.
>
> Weeks can be spent fiddling with this code :-(

Yes, and in the end, it won't be able to compete with  a
specialized/inlined ipv6_csum_partial()