lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Mar 2020 09:56:27 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Yadu Kishore' <kyk.segfault@...il.com>
CC:     David Miller <davem@...emloft.net>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Network Development <netdev@...r.kernel.org>
Subject: RE: [PATCH v2] net: Make skb_segment not to compute checksum if
 network controller supports checksumming

From: Yadu Kishore
> Sent: 03 March 2020 09:15
...
> The perf data I presented was collected on an arm64 platform (hikey960) where
> the do_csum implementation that is called is not in assembly but in C
> (lib/checksum.c)

It is a long time since I've written any arm assembler, but an
asm checksum loop ought to be faster than a C one because using
'add with carry' ought to be a gain.
(Unlike mips style instruction sets without a carry flag.)

However what it more interesting is that do_csum() is being
called at all.
It implies that a large data block is being checksummed 'in situ'
whereas the expectation is that 'linearising' the skb requires
all the data be copied - so the checksum would be done during the
copy.

Additionally unless the copy loop is 'load + store' and
'load + store + adc' can be executed in the same number of
clocks (without excessive loop unrolling) then doing the
checksum in the copy loop isn't 'free'.

For x86 (including old intel cpu where adc is 2 clocks)
the 'checksum in copy' isn't free.

Clearly, if you have to do a copy and a software checksum
it is very likely that doing them together is faster.
(Although a fast 'rep movs' copy and an ad[co]x (or AVX2?)
checksum may be faster on very recent Intel cpu for large
enough buffers.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ