netdev - Re: [PATCH v2] net: Make skb_segment not to compute checksum if network controller supports checksumming

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+FuTSc5QVF_kv8FNs03obXGbf6axrG5umCipE=LXvqQ_-hDAA@mail.gmail.com>
Date:   Thu, 5 Mar 2020 11:06:42 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Yadu Kishore <kyk.segfault@...il.com>
Cc:     David Laight <David.Laight@...lab.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        David Miller <davem@...emloft.net>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH v2] net: Make skb_segment not to compute checksum if
 network controller supports checksumming

On Thu, Mar 5, 2020 at 1:33 AM Yadu Kishore <kyk.segfault@...il.com> wrote:
>
> Hi all,
>
> Though there is scope to optimise the checksum code (from C to asm) for such
> architectures, it is not the intent of this patchset.
> The intent here is only to enable offloading checksum during GSO.
>
> The perf data I presented shows that ~7.4% of the CPU is spent doing checksum
> in the GSO path for architectures where the checksum code is not implemented in
> assembly (arm64).
> If the network controller hardware supports checksumming, then I feel
> that it is worthwhile to offload this even during GSO for such architectures
> and save the 7.25% of the host cpu cycles.

Yes, given the discussion I have no objections. The change to
skb_segment in v2 look fine.

Thanks for sharing the in depth analysis, David. I expected that ~300
cycles per memory access would always dwarf the arithmetic cost.
Perhaps back of the envelope would be that 300/64 ~=5 cyc/B, on the
order of the 2 cyc/B and hence the operation is not entirely
insignificant.  Or more likely that memory access cost is simply
markedly lower here if the data is still warm in a cache.

It seems do_csum is called because csum_partial_copy executes the
two operations independently:

__wsum
csum_partial_copy(const void *src, void *dst, int len, __wsum sum)
{
        memcpy(dst, src, len);
        return csum_partial(dst, len, sum);
}