netdev - Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJ3xEMi5V0ijSVmga0chTFF1ghUaGs7h9PGfYGb10ZdyPcs5_A@mail.gmail.com>
Date:	Sat, 27 Sep 2014 22:26:36 +0300
From:	Or Gerlitz <gerlitz.or@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	David Miller <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels

On Sat, Sep 27, 2014 at 2:04 AM, Tom Herbert <therbert@...gle.com> wrote:
> On Fri, Sep 26, 2014 at 1:16 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>> On Fri, Sep 26, 2014 at 7:22 PM, Tom Herbert <therbert@...gle.com> wrote:
>> [...]
>>> Notes:
>>>   - GSO for GRE/UDP where GRE checksum is enabled does not work.
>>>     Handling this will require some special case code.
>>>   - Software GSO now supports many varieties of encapsulation with
>>>     SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
>>>     for device support of particular combinations (I intend to
>>>     add ndo_gso_check for that).
>>
>> Tom,
>>
>> As I wrote you earlier on another thread/s, fact is that there are
>> upstream drivers who advertize SKB_GSO_UDP_TUNNEL and aren't capable @
>> this point to issue proper HW segmentation of something which isn't
>> VXLAN.
>>
>> Just to make sure, this series isn't expected to introduce a
>> regression, right? we don't expect the stack to attempt and xmit a
>> large 64KB UDP packet which isn't vxlan through these devices.

> I am planning to post ndo_gso_check shortly. These patches should not
> cause a regression with currently deployed functionality (VXLAN).

Can you sum up (please) in 1-2 liner what is the trick to avoid such
regression? that is what/where is the knob that would prevent such
giant chunk to be sent down to a NIC driver which does advertize
SKB_GSO_UDP_TUNNEL?


>>>   - MPLS seems to be the only previous user of inner_protocol. I don't
>>>     believe these patches can affect that. For supporting GSO with
>>>     MPLS over UDP, the inner_protocol should be set using the
>>>     helper functions in this patch.
>>>   - GSO for L2TP/UDP should also be straightforward now.
>>
>>> Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
>>> done using 200 TCP_STREAMs in netperf.
>> [...]
>>>    VXLAN
>>>       TCP_STREAM TSO enabled on tun interface
>>>         16.42% TX CPU utilization
>>>         23.66% RX CPU utilization
>>>         9081 Mbps
>>>       TCP_STREAM TSO disabled on tun interface
>>>         30.32% TX CPU utilization
>>>         30.55% RX CPU utilization
>>>         9185 Mbps
>>
>> so TSO disabled has better BW vs TSO enabled?
>>
> Yes, I've noticed that on occasion, it does seem like TSO disabled
> tends to get a little more throughput. I see this with plain GRE, so I
> don't think it's directly related to fou or these patches. I suppose
> there may be some subtle interactions with BQL or something like that.
> I'd probably want to repro this on some other devices at some point to
> dig deeper.
>
>>>    Baseline (no encp, TSO and LRO enabled)
>>>       TCP_STREAM
>>>         11.85% TX CPU utilization
>>>         15.13% RX CPU utilization
>>>         9452 Mbps
>>
>> I would strongly recommend to have a far better baseline when
>> developing and testing these changes in the stack in the form of 40Gbs
>> NICs.
>>
> The only point of putting the baseline was to show that encapsulation
> with GSO/GRO/checksum-unnec-conversion is in the ballpark of
> performance with native traffic which was a goal.

under (over...) 10Gbs, in the ballpark indeed.

We know nothing what would happen with baseline of 38Gbs (SB 40Gbs
NIC) 56Gbs (two bonded ports of 40Gbs NIC on PCIe gen3) or 100Gbs
(tomorrow's NIC HW, probably coming up next year)

> So I'm pretty happy
> with this performance right now, although it probably does mean remote
> checksum offload won't show so impressive results with this test (TX
> csum with data in case isn't so expensive).
> Out of curiosity, why do you think using 40Gbs is far better for a baseline?

Oh, simply b/c with 40Gbs NICs, the baseline I expect for few sessions
(1,2,4 or 200 as you did) of plain TCP is four times better vs. your
current one (38Gbs vs 9.5Gbs) and this should pose a harder challenge
for the GSO/encapsulating stack to catch up with, agree?

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html