[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSfaTYB0p1yBuJK4226D-vjhhO_-zN3PUFKFdvyKVT5JdA@mail.gmail.com>
Date: Fri, 28 Feb 2020 09:30:56 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Yadu Kishore <kyk.segfault@...il.com>
Cc: Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>
Subject: Re: [PATCH] net: Make skb_segment not to compute checksum if network
controller supports checksumming
On Fri, Feb 28, 2020 at 12:25 AM Yadu Kishore <kyk.segfault@...il.com> wrote:
>
> > Did you measure a cycle efficiency improvement? As discussed in the
> > referred email thread, the kernel uses checksum_and_copy because it is
> > generally not significantly more expensive than copy alone
> > skb_segment already is a very complex function. New code needs to
> > offer a tangible benefit.
>
> I ran iperf TCP Tx traffic of 1000 megabytes and captured the cpu cycle
> utilization using perf:
> "perf record -e cycles -a iperf \
> -c 192.168.2.53 -p 5002 -fm -n 1048576000 -i 2 -l 8k -w 8m"
>
> I see the following are the top consumers of cpu cycles:
>
> Function %cpu cycles
> ======= =========
> skb_mac_gso_segment 0.02
> inet_gso_segment 0.26
> tcp4_gso_segment 0.02
> tcp_gso_segment 0.19
> skb_segment 0.52
> skb_copy_and_csum_bits 0.64
> do_csum 7.25
> memcpy 3.71
> __alloc_skb 0.91
> ========== ====
> SUM 13.52
>
> The measurement was done on an arm64 hikey960 platform running android with
> linux kernel ver 4.19.23.
> I see that 7.25% of the cpu cycles is spent computing the checksum against the
> total of 13.52% of cpu cycles.
> Which means around 52.9% of the total cycles is spent doing checksum.
> Hence the attempt to try to offload checksum in the case of GSO also.
Can you contrast this against a run with your changes? The thought is
that the majority of this cost is due to the memory loads and stores, not
the arithmetic ops to compute the checksum. When enabling checksum
offload, the same stalls will occur, but will simply be attributed to
memcpy instead of to do_csum. A:B comparisons of absolute (-n) cycle
counts are usually very noisy, but it's worth a shot.
> > Is this not already handled by __copy_skb_header above? If ip_summed
> > has to be initialized, so have csum_start and csum_offset. That call
> > should have initialized all three.
>
> Thanks, I will look into why even though __copy_skb_header is being
> called, I am still
> seeing skb->ip_summed set to CHECKSUM_NONE in the network driver.
Thanks.
Powered by blists - more mailing lists