netdev - Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable outer Tx checksum by default

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALx6S351KYGFFLi08cX=DFCucZtGJLoMJGvFAW1_H-cRXZ0rQg@mail.gmail.com>
Date:	Fri, 19 Feb 2016 16:14:06 -0800
From:	Tom Herbert <tom@...bertland.com>
To:	Jesse Gross <jesse@...nel.org>
Cc:	Alex Duyck <aduyck@...antis.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable outer Tx checksum by default

On Fri, Feb 19, 2016 at 4:08 PM, Jesse Gross <jesse@...nel.org> wrote:
> On Fri, Feb 19, 2016 at 3:10 PM, Alex Duyck <aduyck@...antis.com> wrote:
>> On Fri, Feb 19, 2016 at 1:53 PM, Jesse Gross <jesse@...nel.org> wrote:
>>> On Fri, Feb 19, 2016 at 11:26 AM, Alexander Duyck <aduyck@...antis.com> wrote:
>>>> This patch series makes it so that we enable the outer Tx checksum for IPv4
>>>> tunnels by default.  This makes the behavior consistent with how we were
>>>> handling this for IPv6.  In addition I have updated the internal flags for
>>>> these tunnels so that we use a ZERO_CSUM_TX flag for IPv4 which should
>>>> match up will with the ZERO_CSUM6_TX flag which was already in use for
>>>> IPv6.
>>>>
>>>> For most network devices this should be a net gain in terms of performance
>>>> as having the outer header checksum present allows for devices to report
>>>> CHECKSUM_UNNECESSARY which we can then convert to CHECKSUM_COMPLETE in order
>>>> to determine if the inner header checksum is valid.
>>>>
>>>> Below is some data I collected with ixgbe with an X540 that demonstrates
>>>> this.  I located two PFs connected back to back in two different name
>>>> spaces and then setup a pair of tunnels on each, one with checksum enabled
>>>> and one without.
>>>>
>>>> Recv   Send    Send                          Utilization
>>>> Socket Socket  Message  Elapsed              Send
>>>> Size   Size    Size     Time     Throughput  local
>>>> bytes  bytes   bytes    secs.    10^6bits/s  % S
>>>>
>>>> noudpcsum:
>>>>  87380  16384  16384    30.00      8898.67   12.80
>>>> udpcsum:
>>>>  87380  16384  16384    30.00      9088.47   5.69
>>>>
>>>> The one spot where this may cause a performance regression is if the
>>>> environment contains devices that can parse the inner headers and a device
>>>> supports NETIF_F_GSO_UDP_TUNNEL but not NETIF_F_GSO_UDP_TUNNEL_CSUM.  In
>>>> the case of such a device we have to fall back to using GSO to segment the
>>>> tunnel instead of TSO and as a result we may take a performance hit as seen
>>>> below with i40e.
>>>
>>> Do you have any numbers from 40G links? Obviously, at 10G the links
>>> are basically saturated and while I can see a difference in the
>>> utilization rate, I suspect that the change will be much more apparent
>>> at higher speeds.
>>
>> Unfortunately I don't have any true 40G links to test with.  The
>> closest I can get is to run PF to VF on an i40e.  Running that I have
>> seen the numbers go from about 20Gb/s to 15Gb/s with almost all the
>> difference being related to the fact that we are having to
>> allocate/free more skbs and make more trips through the
>> i40e_lan_xmit_frame function resulting in more descriptors.
>
> OK, I guess that is more or less in line with what I would expect off
> the top my head. There is a reasonably significant drop in the worst
> case.
>
>>> I'm concerned about the drop in performance for devices that currently
>>> support offloads (almost none of which expose
>>> NETIF_F_GSO_UDP_TUNNEL_CSUM as a feature). Presumably the people that
>>> care most about tunnel performance are the ones that already have
>>> these NICs and will be the most impacted by the drop.
>>
>> The problem is being able to transmit fast is kind of pointless if the
>> receiving end cannot handle it.  We hadn't gotten around to really
>> getting the Rx checksum bits working until the 3.18 kernel which I
>> don't suspect many people are running so at this point messing with
>> the TSO bits isn't really making much of a difference.  Then on top of
>> that most devices have certain limitations on how many ports they can
>> handle and such.  I know the i40e is supposed to support something
>> like 10 port numbers, but the fm10k and ixgbe are limited to one port
>> as I recall.  So this whole thing is already really brittle as it is.
>> My goal with this change is to make the behavior more consistent
>> across the board.
>
> That's true to some degree but there are certainly plenty of cases
> where TSO makes a difference - lower CPU usage, transmitting to
> multiple receivers, people will upgrade their kernels, etc. It's
> clearly good to make things more consistent but hopefully not by
> reducing existing performance. :)
>
>>> My hope is that we can continue to use TSO on devices that only
>>> support NETIF_F_GSO_UDP_TUNNEL. The main problem is that the UDP
>>> length field may vary across segments. However, in practice this is
>>> the only on the final segment and only in cases where the total length
>>> is not a multiple of the MSS. If we could detect cases where those
>>> conditions are met, we could continue to use TSO with the UDP checksum
>>> field pre-populated. A possible step even further would be to break
>>> off the final segment into a separate packet to make things conform if
>>> necessary. This would avoid a performance regression and I think make
>>> this more palatable to a lot of people.
>>
>> I think Tom and I had discussed this possibility a bit at netconf.
>> The GSO logic is something I planned on looking at over the next
>> several weeks as I suspect there is probably room for improvement
>> there.
>
> That sounds great.
>
>>>> I also haven't investigated the effect this will have on OVS.  However I
>>>> suspect the impact should be minimal as the worst case scenario should be
>>>> that Tx checksumming will become enabled by default which should be
>>>> consistent with the existing behavior for IPv6.
>>>
>>> I don't think that it should cause any problems.
>>
>> Good to hear.
>>
>> Do you know if OVS has some way to control the VXLAN configuration so
>> that it could disable Tx checksums?  If so that would probably be a
>> good way to address the 40G issues assuming someone is running an
>> environment hat had nothing but NICs that can support the TSO and Rx
>> checksum on inner headers.
>
> Yes - OVS can control tx checksums on a per-endpoint basis (actually,
> rx checksum present requirements as well though it's not exposed to
> the user at the moment). If you had the information then you could
> optimize what to use in an environment of, say, hypervisors and
> hardware switches.
>
> However, it's certainly possible that you have a mixed set of NICs
> such as an encap aware NIC on the transmit side and non-aware on the
> receive side. In that case, both possible checksum settings penalize
> somebody: off (lose GRO on receiver), on (lose TSO on sender assuming
> no support for NETIF_F_GSO_UDP_TUNNEL_CSUM). That's why I think it's
> important to be able to use encap TSO with local checksum to avoid
> these bad tradeoffs, not to mention being cleaner.

By "local checksum" do you mean LCO? Seems like we should be able to
get that to work with NETIF_F_GSO_TUNNEL_CSUM.

Tom