netdev - Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 28 Sep 2014 20:59:23 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	Or Gerlitz <gerlitz.or@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels

On Sat, Sep 27, 2014 at 12:26 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
> On Sat, Sep 27, 2014 at 2:04 AM, Tom Herbert <therbert@...gle.com> wrote:
>> On Fri, Sep 26, 2014 at 1:16 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>> On Fri, Sep 26, 2014 at 7:22 PM, Tom Herbert <therbert@...gle.com> wrote:
>>> [...]
>>>> Notes:
>>>>   - GSO for GRE/UDP where GRE checksum is enabled does not work.
>>>>     Handling this will require some special case code.
>>>>   - Software GSO now supports many varieties of encapsulation with
>>>>     SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
>>>>     for device support of particular combinations (I intend to
>>>>     add ndo_gso_check for that).
>>>
>>> Tom,
>>>
>>> As I wrote you earlier on another thread/s, fact is that there are
>>> upstream drivers who advertize SKB_GSO_UDP_TUNNEL and aren't capable @
>>> this point to issue proper HW segmentation of something which isn't
>>> VXLAN.
>>>
>>> Just to make sure, this series isn't expected to introduce a
>>> regression, right? we don't expect the stack to attempt and xmit a
>>> large 64KB UDP packet which isn't vxlan through these devices.
>
>> I am planning to post ndo_gso_check shortly. These patches should not
>> cause a regression with currently deployed functionality (VXLAN).
>
> Can you sum up (please) in 1-2 liner what is the trick to avoid such
> regression? that is what/where is the knob that would prevent such
> giant chunk to be sent down to a NIC driver which does advertize
> SKB_GSO_UDP_TUNNEL?
>
I posted patch for ndo_gso_check. Please let me know if you'll be able
to work with this. I'll also post the iproute changes soon so that the
FOU results can be repro'd.

>
>>>>   - MPLS seems to be the only previous user of inner_protocol. I don't
>>>>     believe these patches can affect that. For supporting GSO with
>>>>     MPLS over UDP, the inner_protocol should be set using the
>>>>     helper functions in this patch.
>>>>   - GSO for L2TP/UDP should also be straightforward now.
>>>
>>>> Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
>>>> done using 200 TCP_STREAMs in netperf.
>>> [...]
>>>>    VXLAN
>>>>       TCP_STREAM TSO enabled on tun interface
>>>>         16.42% TX CPU utilization
>>>>         23.66% RX CPU utilization
>>>>         9081 Mbps
>>>>       TCP_STREAM TSO disabled on tun interface
>>>>         30.32% TX CPU utilization
>>>>         30.55% RX CPU utilization
>>>>         9185 Mbps
>>>
>>> so TSO disabled has better BW vs TSO enabled?
>>>
>> Yes, I've noticed that on occasion, it does seem like TSO disabled
>> tends to get a little more throughput. I see this with plain GRE, so I
>> don't think it's directly related to fou or these patches. I suppose
>> there may be some subtle interactions with BQL or something like that.
>> I'd probably want to repro this on some other devices at some point to
>> dig deeper.
>>
>>>>    Baseline (no encp, TSO and LRO enabled)
>>>>       TCP_STREAM
>>>>         11.85% TX CPU utilization
>>>>         15.13% RX CPU utilization
>>>>         9452 Mbps
>>>
>>> I would strongly recommend to have a far better baseline when
>>> developing and testing these changes in the stack in the form of 40Gbs
>>> NICs.
>>>
>> The only point of putting the baseline was to show that encapsulation
>> with GSO/GRO/checksum-unnec-conversion is in the ballpark of
>> performance with native traffic which was a goal.
>
> under (over...) 10Gbs, in the ballpark indeed.
>
> We know nothing what would happen with baseline of 38Gbs (SB 40Gbs
> NIC) 56Gbs (two bonded ports of 40Gbs NIC on PCIe gen3) or 100Gbs
> (tomorrow's NIC HW, probably coming up next year)
>
>> So I'm pretty happy
>> with this performance right now, although it probably does mean remote
>> checksum offload won't show so impressive results with this test (TX
>> csum with data in case isn't so expensive).
>> Out of curiosity, why do you think using 40Gbs is far better for a baseline?
>
> Oh, simply b/c with 40Gbs NICs, the baseline I expect for few sessions
> (1,2,4 or 200 as you did) of plain TCP is four times better vs. your
> current one (38Gbs vs 9.5Gbs) and this should pose a harder challenge
> for the GSO/encapsulating stack to catch up with, agree?
>
Sure, I agree that it would be nice to have this tested on different
devices (40G, 1G, wireless, etc.)-- but right now I don't see anything
particularly obvious why performance shouldn't scale linearly.

> Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html