netdev - Re: Generic TSO (was Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable outer Tx checksum by default)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 11 Mar 2016 14:55:33 -0800
From:	Tom Herbert <tom@...bertland.com>
To:	Alexander Duyck <alexander.duyck@...il.com>
Cc:	Edward Cree <ecree@...arflare.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Generic TSO (was Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable
 outer Tx checksum by default)

On Fri, Mar 11, 2016 at 2:31 PM, Alexander Duyck
<alexander.duyck@...il.com> wrote:
> On Fri, Mar 11, 2016 at 1:29 PM, Edward Cree <ecree@...arflare.com> wrote:
>> On 11/03/16 21:09, Alexander Duyck wrote:
>>> The only real issue with the "generic" TSO is that it isn't going to
>>> be so generic.  We have different devices that will support different
>>> levels of stuff.  For example the ixgbe drivers will need to treat the
>>> outer tunnel header as one giant L2 header.  As a result we will need
>>> to populate all the fields in the outer header including the outer IP
>>> ID, checksum, udp->len, and UDP or GRE checksum if requested.  For
>>> i40e I think this gets a bit simpler as they already handle the outer
>>> IPv4 ID and checksum.  I think there we may need to only populate the
>>> checksum for it to work out correctly.  As such I may look at coming
>>> up with a number of functions so that we can mix and match based on
>>> what is needed in order to assemble a partially segmented frame.
>> AIUI, the point of the design is that we _can_ populate everything,
>> because we're keeping lengths and outer IP ID fixed, so outer checksums
>> stay the same and the outer tunnel header _is_ just one giant L2 header
>> which is bit-for-bit identical for each generated segment.  So every
>> devicegets to just be dumb and treat it as opaque.
>
> This works so long as the device isn't trying to do anything like
> insert VLAN tags.  Then I think we might have an issue since we don't
> want to confuse the device and have it trying to insert the tag on the
> inner frame's Ethernet header.
>
In Edward's giant L2 header mode, couldn't VLAN tags just be part of that?

> I suspect we may have differing levels of "dumb" that we have to deal
> with.  That is all I am saying.  By default we could just populate all
> of the length and checksum fields in the outer header, we would just
> have to be consistent about what is provided then.  In addition there
> will be the matter of sorting out the IP ID bits.  For example some of
> the i40e parts support tunnel offloads, but not tunnel offloads with
> checksums enabled.  I suspect those parts will end up wanting to
> handle the outer IP header and UDP length values.  As a result there
> trying to do a "dumb" send may result in us really messing up the IP
> ID values if we don't take steps to make it a bit smarter.
>
>>> The other issue I am working on at the moment to enable all this is to
>>> fix the differents between csum_tcpudp_magic and csum_ipv6_magic in
>>> terms of handling packet lengths greater than 65535.  Currently we are
>>> messing up the checksum in relation to IPv6 since we are using the
>>> truncated uh->len value.  I'll be submitting some patches later today
>>> that will hopefully get that fixed and that in turn should make the
>>> rest of the segmentation work easier.
>> Again, in the superpacket we want to calculate the checksum based on the
>> subsegment length, rather than the length of the superpacket.  The idea
>> is to create the header for an MSS-sized segment, then follow it with an
>> inner IP & TCP header, and n*MSS bytes of payload.  (This of course
>> produces a superpacket that's not what you'd send over a link with a 64k
>> MTU, unlike how we do it today.)
>
> The question is at what point do we do the chopping.  Should we be
> doing this in the drivers or somewhere higher in the stack like we do
> for standard GSO segmentation.  I would think we would need to add
> another bit that says we can do GSO with custom outer headers since I
> can see VLANs being a possible issue otherwise.
>
>> Then hw just chops up the payload, fixes up the inner headers, and glues
>> the "L2" header on each packet.
>
> Yea, it sounds really straight forward and easy.  It isn't till you
> start digging into the actual code that it gets a bit hairy.
>
> What this effectively is is another form of TSO where each driver will
> want to do things a little differently.  Alot of it has to do with the
> fact that this is kind of a nasty hack that we are trying to add since
> many devices won't like the fact that we are lying about the size of
> our actual L2 header so things like VLAN tag insertion are going to
> end up blowing back on us.
>
Right, the point is that we're trying to get out of the model where
every driver/device implements TSO differently, supports ad hoc
protocols, etc. Do you see any other common invasive technique that we
need to deal with other than VLAN insertion and IP ID?

> Really my preference in the case of ixgbe would have been to let the
> hardware update the outer IP header and the inner TCP header and then
> do the UDP and inner IP header as the static headers.  That way we
> could still theoretically support fragmentation on the outer headers
> which last I knew is a very real possibility since the DF bit is not
> set on the outer headers for VXLAN I believe.
>
> - Alex