netdev - RE: [PATCH v2] net: Make skb_segment not to compute checksum if network controller supports checksumming

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <de1012794ec54314b6fe790c01dee60b@AcuMS.aculab.com>
Date:   Tue, 3 Mar 2020 09:56:27 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Yadu Kishore' <kyk.segfault@...il.com>
CC:     David Miller <davem@...emloft.net>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Network Development <netdev@...r.kernel.org>
Subject: RE: [PATCH v2] net: Make skb_segment not to compute checksum if
 network controller supports checksumming

From: Yadu Kishore
> Sent: 03 March 2020 09:15
...
> The perf data I presented was collected on an arm64 platform (hikey960) where
> the do_csum implementation that is called is not in assembly but in C
> (lib/checksum.c)

It is a long time since I've written any arm assembler, but an
asm checksum loop ought to be faster than a C one because using
'add with carry' ought to be a gain.
(Unlike mips style instruction sets without a carry flag.)

However what it more interesting is that do_csum() is being
called at all.
It implies that a large data block is being checksummed 'in situ'
whereas the expectation is that 'linearising' the skb requires
all the data be copied - so the checksum would be done during the
copy.

Additionally unless the copy loop is 'load + store' and
'load + store + adc' can be executed in the same number of
clocks (without excessive loop unrolling) then doing the
checksum in the copy loop isn't 'free'.

For x86 (including old intel cpu where adc is 2 clocks)
the 'checksum in copy' isn't free.

Clearly, if you have to do a copy and a software checksum
it is very likely that doing them together is faster.
(Although a fast 'rep movs' copy and an ad[co]x (or AVX2?)
checksum may be faster on very recent Intel cpu for large
enough buffers.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)