lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 Nov 2021 06:21:38 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Alexander Duyck <alexander.duyck@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [PATCH v1] x86/csum: rewrite csum_partial()

On Fri, Nov 12, 2021 at 1:13 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Nov 11, 2021 at 02:30:50PM -0800, Eric Dumazet wrote:
> > > For values 7 through 1 I wonder if you wouldn't be better served by
> > > just doing a single QWORD read and a pair of shifts. Something along
> > > the lines of:
> > >     if (len) {
> > >         shift = (8 - len) * 8;
> > >         temp64 = (*(unsigned long)buff << shift) >> shift;
> > >         result += temp64;
> > >         result += result < temp64;
> > >     }
> >
> > Again, KASAN will not be happy.
>
> If you do it in asm, kasan will not know, so who cares :-) as long as
> the load is aligned, loading beyond @len shouldn't be a problem,
> otherwise there's load_unaligned_zeropad().

OK, but then in this case we have to align buff on qword boundary,
or risk crossing page boundary.

So this stuff has to be done at the beginning, and at the end.

And with IP_IP_ALIGN==0, this will unfortunately trigger for the 40-byte
IPV6 header.

IPv6 header :  <2 bytes before qword boundary><4 * 8 bytes> < 6 bytes at trail>

I will try, but I have some doubts it can save one or two cycles...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ