lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSfaTYB0p1yBuJK4226D-vjhhO_-zN3PUFKFdvyKVT5JdA@mail.gmail.com>
Date:   Fri, 28 Feb 2020 09:30:56 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Yadu Kishore <kyk.segfault@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>
Subject: Re: [PATCH] net: Make skb_segment not to compute checksum if network
 controller supports checksumming

On Fri, Feb 28, 2020 at 12:25 AM Yadu Kishore <kyk.segfault@...il.com> wrote:
>
> > Did you measure a cycle efficiency improvement? As discussed in the
> > referred email thread, the kernel uses checksum_and_copy because it is
> > generally not significantly more expensive than copy alone
> > skb_segment already is a very complex function. New code needs to
> > offer a tangible benefit.
>
> I ran iperf TCP Tx traffic of 1000 megabytes and captured the cpu cycle
> utilization using perf:
> "perf record -e cycles -a iperf \
> -c 192.168.2.53 -p 5002 -fm -n 1048576000 -i 2  -l 8k -w 8m"
>
> I see the following are the top consumers of cpu cycles:
>
> Function                                   %cpu cycles
> =======                                   =========
> skb_mac_gso_segment            0.02
> inet_gso_segment                     0.26
> tcp4_gso_segment                    0.02
> tcp_gso_segment                      0.19
> skb_segment                             0.52
> skb_copy_and_csum_bits         0.64
> do_csum                                    7.25
> memcpy                                     3.71
> __alloc_skb                                0.91
> ==========                              ====
> SUM                                           13.52
>
> The measurement was done on an arm64 hikey960 platform running android with
> linux kernel ver 4.19.23.
> I see that 7.25% of the cpu cycles is spent computing the checksum against the
> total of 13.52% of cpu cycles.
> Which means around 52.9% of the total cycles is spent doing checksum.
> Hence the attempt to try to offload checksum in the case of GSO also.

Can you contrast this against a run with your changes? The thought is
that the majority of this cost is due to the memory loads and stores, not
the arithmetic ops to compute the checksum. When enabling checksum
offload, the same stalls will occur, but will simply be attributed to
memcpy instead of to do_csum. A:B comparisons of absolute (-n) cycle
counts are usually very noisy, but it's worth a shot.


> > Is this not already handled by __copy_skb_header above? If ip_summed
> > has to be initialized, so have csum_start and csum_offset. That call
> > should have initialized all three.
>
> Thanks, I will look into why even though __copy_skb_header is being
> called, I am still
> seeing skb->ip_summed set to CHECKSUM_NONE in the network driver.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ