lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSc5QVF_kv8FNs03obXGbf6axrG5umCipE=LXvqQ_-hDAA@mail.gmail.com>
Date:   Thu, 5 Mar 2020 11:06:42 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Yadu Kishore <kyk.segfault@...il.com>
Cc:     David Laight <David.Laight@...lab.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        David Miller <davem@...emloft.net>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH v2] net: Make skb_segment not to compute checksum if
 network controller supports checksumming

On Thu, Mar 5, 2020 at 1:33 AM Yadu Kishore <kyk.segfault@...il.com> wrote:
>
> Hi all,
>
> Though there is scope to optimise the checksum code (from C to asm) for such
> architectures, it is not the intent of this patchset.
> The intent here is only to enable offloading checksum during GSO.
>
> The perf data I presented shows that ~7.4% of the CPU is spent doing checksum
> in the GSO path for architectures where the checksum code is not implemented in
> assembly (arm64).
> If the network controller hardware supports checksumming, then I feel
> that it is worthwhile to offload this even during GSO for such architectures
> and save the 7.25% of the host cpu cycles.

Yes, given the discussion I have no objections. The change to
skb_segment in v2 look fine.

Thanks for sharing the in depth analysis, David. I expected that ~300
cycles per memory access would always dwarf the arithmetic cost.
Perhaps back of the envelope would be that 300/64 ~=5 cyc/B, on the
order of the 2 cyc/B and hence the operation is not entirely
insignificant.  Or more likely that memory access cost is simply
markedly lower here if the data is still warm in a cache.

It seems do_csum is called because csum_partial_copy executes the
two operations independently:

__wsum
csum_partial_copy(const void *src, void *dst, int len, __wsum sum)
{
        memcpy(dst, src, len);
        return csum_partial(dst, len, sum);
}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ