lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 22 Mar 2020 12:53:50 -0700
From:   Tom Herbert <tom@...bertland.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Yadu Kishore <kyk.segfault@...il.com>,
        David Miller <davem@...emloft.net>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH v2] net: Make skb_segment not to compute checksum if
 network controller supports checksumming

On Fri, Mar 6, 2020 at 9:12 AM David Laight <David.Laight@...lab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 05 March 2020 17:20
> >
> > On 3/5/20 9:00 AM, David Laight wrote:
> > > From: Willem de Bruijn
> > >> Sent: 05 March 2020 16:07
> > > ..
> > >> It seems do_csum is called because csum_partial_copy executes the
> > >> two operations independently:
> > >>
> > >> __wsum
> > >> csum_partial_copy(const void *src, void *dst, int len, __wsum sum)
> > >> {
> > >>         memcpy(dst, src, len);
> > >>         return csum_partial(dst, len, sum);
> > >> }
> > >
> > > And do_csum() is superbly horrid.
> > > Not the least because it is 32bit on 64bit systems.
> >
> > There are many versions, which one is discussed here ?
> >
> > At least the current one seems to be 64bit optimized.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5777eaed566a1d63e344d3dd
> > 8f2b5e33be20643e
>
> I was looking at the generic one in $(SRC)/lib/checksum.c.
>
> FWIW I suspect the fastest code on pre sandy bridge 64bit intel cpus
> (where adc is 2 clocks) is to do a normal 'add', shift the carries
> into a 64bit register and do a software 'popcnt' every 512 bytes.
> That may run at 8 bytes/clock + the popcnt.

A while back, I had proposed an optimized x86 checksum function using
unrolled addq a while back https://lwn.net/Articles/679137/. Also,
this tries to optimize from small checksum like over header when doing
skb_postpull_rcsum.

Tom

>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ