netdev - Re: [RFC PATCH 7/9] GSO: Support partial segmentation offload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEh+42jq1byHx0yuZgUCLqfk3G3eGfhoA1oq4Wwn5SmefCbNnQ@mail.gmail.com>
Date:	Tue, 22 Mar 2016 13:11:21 -0700
From:	Jesse Gross <jesse@...nel.org>
To:	Edward Cree <ecree@...arflare.com>
Cc:	Alexander Duyck <alexander.duyck@...il.com>,
	Alexander Duyck <aduyck@...antis.com>,
	Netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Tom Herbert <tom@...bertland.com>
Subject: Re: [RFC PATCH 7/9] GSO: Support partial segmentation offload

On Tue, Mar 22, 2016 at 12:40 PM, Edward Cree <ecree@...arflare.com> wrote:
> On 22/03/16 17:47, Alexander Duyck wrote:
>> On Tue, Mar 22, 2016 at 10:00 AM, Edward Cree <ecree@...arflare.com> wrote:
>>> On 18/03/16 23:25, Alexander Duyck wrote:
>>>> This patch adds support for something I am referring to as GSO partial.
>>>> The basic idea is that we can support a broader range of devices for
>>>> segmentation if we use fixed outer headers and have the hardware only
>>>> really deal with segmenting the inner header.  The idea behind the naming
>>>> is due to the fact that everything before csum_start will be fixed headers,
>>>> and everything after will be the region that is handled by hardware.
>>>>
>>>> With the current implementation it allows us to add support for the
>>>> following GSO types with an inner TSO or TSO6 offload:
>>>> NETIF_F_GSO_GRE
>>>> NETIF_F_GSO_GRE_CSUM
>>>> NETIF_F_UDP_TUNNEL
>>>> NETIF_F_UDP_TUNNEL_CSUM
>>>>
>>>> Signed-off-by: Alexander Duyck <aduyck@...antis.com>
>>>> ---
>>> If I'm correctly understanding what you're doing, you're building a large
>>> TCP segment, feeding it through the encapsulation drivers as normal, then
>>> at GSO time you're fixing up length fields, checksums etc. in the headers.
>>> I think we can do this more simply, by making it so that at the time when
>>> we _generate_ the TCP segment, we give it headers saying it's one MSS big,
>>> but have several MSS of data.  Similarly when adding the encap headers,
>>> they all need to get their lengths from what the layer below tells them,
>>> rather than the current length of data in the SKB.  Then at GSO time all
>>> the headers already have the right things in, and you don't need to call
>>> any per-protocol GSO callbacks for them.
>> One issue I have to deal with here is that we have no way of knowing
>> what the underlying hardware can support at the time of segment being
>> created.  You have to keep in mind that what we have access to is the
>> tunnel dev in many cases, not the underlying dev so we don't know if
>> things can be offloaded to hardware or not.  By pushing this logic
>> into the GSO code we can actually implement it without much overhead
>> since we either segment it into an MSS multiple, or into single MSS
>> sized chunks.  This way we defer the decision until the very last
>> moment when we actually know if we can offload some portion of this in
>> hardware or not.
> But won't the tunnel dev have the feature flag for GSO_PARTIAL depending
> on what the underlying dev advertises?  (Or, at least, could we make it
> bethatway?)

Features that have been designed this way in the past are usually
pretty fragile. Not only would you have to track changes in the
routing table but you could have bridges, tc, vlan devices, etc. all
of which might change the path of the packet and would have to somehow
propagate this information. It's much more robust to resolve the
device capabilities just before you hand the packet to that device.
Plus, anything along the path of the packet (iptables, for example)
that looks at the headers might potentially need to be aware of this
optimization.

You're also assuming that the generating TCP stack is resident on the
same machine as the device that does the offloads. That's not
necessarily true in the case of VMs or remote senders whose packets
have been GRO'ed.

Keeping the core stack consistent and just handling this at the
GRO/driver layer as Alex has here seems preferable to me.