netdev - Re: [PATCH net-next] net: Implement fast csum_partial for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <568AEF81.7070404@stressinduktion.org>
Date:	Mon, 4 Jan 2016 23:17:37 +0100
From:	Hannes Frederic Sowa <hannes@...essinduktion.org>
To:	Tom Herbert <tom@...bertland.com>, davem@...emloft.net,
	netdev@...r.kernel.org
Cc:	kernel-team@...com
Subject: Re: [PATCH net-next] net: Implement fast csum_partial for x86_64

On 04.01.2016 00:22, Tom Herbert wrote:
> Implement assembly routine for csum_partial for 64 bit x86. This
> primarily speeds up checksum calculation for smaller lengths such as
> those that are present when doing skb_postpull_rcsum when getting
> CHECKSUM_COMPLETE from device or after CHECKSUM_UNNECESSARY
> conversion.
>
> This implementation is similar to csum_partial implemented in
> checksum_32.S, however since we are dealing with 8 bytes at a time
> there are more cases for alignment and small lengths-- for those we
> employ jump tables.
>
> Testing:
>
> Verified correctness by testing arbitrary length buffer filled with
> random data. For each buffer I compared the computed checksum
> using the original algorithm for each possible alignment (0-7 bytes).
>
> Checksum performance:
>
> Isolating old and new implementation for some common cases:
>
>                          Old      New
> Case                    nsecs    nsecs    Improvement
> ---------------------+--------+--------+-----------------------------
> 1400 bytes (0 align)    194.4    176.7      9%    (Big packet)
> 40 bytes (0 align)      10.5     5.7       45%    (Ipv6 hdr common case)
> 8 bytes (4 align)       8.6      7.4       15%    (UDP, VXLAN in IPv4)
> 14 bytes (0 align)      10.4     6.5       37%    (Eth hdr)
> 14 bytes (4 align)      10.8     7.8       27%    (Eth hdr in IPv4)
>
> Signed-off-by: Tom Herbert <tom@...bertland.com>

I verified the implementation through tests and can also see a speed-up 
in almost all cases. Unfortunately _addcarry_u64 intrinsics and __int128 
for letting the compiler use adc instructions generated even worse code 
as the current implementation.

Acked-by: Hannes Frederic Sowa <hannes@...essinduktion.org>

Thanks Tom!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html