[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSc5QVF_kv8FNs03obXGbf6axrG5umCipE=LXvqQ_-hDAA@mail.gmail.com>
Date: Thu, 5 Mar 2020 11:06:42 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Yadu Kishore <kyk.segfault@...il.com>
Cc: David Laight <David.Laight@...lab.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Miller <davem@...emloft.net>,
Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH v2] net: Make skb_segment not to compute checksum if
network controller supports checksumming
On Thu, Mar 5, 2020 at 1:33 AM Yadu Kishore <kyk.segfault@...il.com> wrote:
>
> Hi all,
>
> Though there is scope to optimise the checksum code (from C to asm) for such
> architectures, it is not the intent of this patchset.
> The intent here is only to enable offloading checksum during GSO.
>
> The perf data I presented shows that ~7.4% of the CPU is spent doing checksum
> in the GSO path for architectures where the checksum code is not implemented in
> assembly (arm64).
> If the network controller hardware supports checksumming, then I feel
> that it is worthwhile to offload this even during GSO for such architectures
> and save the 7.25% of the host cpu cycles.
Yes, given the discussion I have no objections. The change to
skb_segment in v2 look fine.
Thanks for sharing the in depth analysis, David. I expected that ~300
cycles per memory access would always dwarf the arithmetic cost.
Perhaps back of the envelope would be that 300/64 ~=5 cyc/B, on the
order of the 2 cyc/B and hence the operation is not entirely
insignificant. Or more likely that memory access cost is simply
markedly lower here if the data is still warm in a cache.
It seems do_csum is called because csum_partial_copy executes the
two operations independently:
__wsum
csum_partial_copy(const void *src, void *dst, int len, __wsum sum)
{
memcpy(dst, src, len);
return csum_partial(dst, len, sum);
}
Powered by blists - more mailing lists