netdev - Re: Generic TSO (was Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable outer Tx checksum by default)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UeBx8NepNM2oGkYOnPV2Niy9GPd7RDZdgDMY3EcRMxNmw@mail.gmail.com>
Date:	Fri, 11 Mar 2016 14:31:29 -0800
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Edward Cree <ecree@...arflare.com>
Cc:	Tom Herbert <tom@...bertland.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Generic TSO (was Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable
 outer Tx checksum by default)

On Fri, Mar 11, 2016 at 1:29 PM, Edward Cree <ecree@...arflare.com> wrote:
> On 11/03/16 21:09, Alexander Duyck wrote:
>> The only real issue with the "generic" TSO is that it isn't going to
>> be so generic.  We have different devices that will support different
>> levels of stuff.  For example the ixgbe drivers will need to treat the
>> outer tunnel header as one giant L2 header.  As a result we will need
>> to populate all the fields in the outer header including the outer IP
>> ID, checksum, udp->len, and UDP or GRE checksum if requested.  For
>> i40e I think this gets a bit simpler as they already handle the outer
>> IPv4 ID and checksum.  I think there we may need to only populate the
>> checksum for it to work out correctly.  As such I may look at coming
>> up with a number of functions so that we can mix and match based on
>> what is needed in order to assemble a partially segmented frame.
> AIUI, the point of the design is that we _can_ populate everything,
> because we're keeping lengths and outer IP ID fixed, so outer checksums
> stay the same and the outer tunnel header _is_ just one giant L2 header
> which is bit-for-bit identical for each generated segment.  So every
> devicegets to just be dumb and treat it as opaque.

This works so long as the device isn't trying to do anything like
insert VLAN tags.  Then I think we might have an issue since we don't
want to confuse the device and have it trying to insert the tag on the
inner frame's Ethernet header.

I suspect we may have differing levels of "dumb" that we have to deal
with.  That is all I am saying.  By default we could just populate all
of the length and checksum fields in the outer header, we would just
have to be consistent about what is provided then.  In addition there
will be the matter of sorting out the IP ID bits.  For example some of
the i40e parts support tunnel offloads, but not tunnel offloads with
checksums enabled.  I suspect those parts will end up wanting to
handle the outer IP header and UDP length values.  As a result there
trying to do a "dumb" send may result in us really messing up the IP
ID values if we don't take steps to make it a bit smarter.

>> The other issue I am working on at the moment to enable all this is to
>> fix the differents between csum_tcpudp_magic and csum_ipv6_magic in
>> terms of handling packet lengths greater than 65535.  Currently we are
>> messing up the checksum in relation to IPv6 since we are using the
>> truncated uh->len value.  I'll be submitting some patches later today
>> that will hopefully get that fixed and that in turn should make the
>> rest of the segmentation work easier.
> Again, in the superpacket we want to calculate the checksum based on the
> subsegment length, rather than the length of the superpacket.  The idea
> is to create the header for an MSS-sized segment, then follow it with an
> inner IP & TCP header, and n*MSS bytes of payload.  (This of course
> produces a superpacket that's not what you'd send over a link with a 64k
> MTU, unlike how we do it today.)

The question is at what point do we do the chopping.  Should we be
doing this in the drivers or somewhere higher in the stack like we do
for standard GSO segmentation.  I would think we would need to add
another bit that says we can do GSO with custom outer headers since I
can see VLANs being a possible issue otherwise.

> Then hw just chops up the payload, fixes up the inner headers, and glues
> the "L2" header on each packet.

Yea, it sounds really straight forward and easy.  It isn't till you
start digging into the actual code that it gets a bit hairy.

What this effectively is is another form of TSO where each driver will
want to do things a little differently.  Alot of it has to do with the
fact that this is kind of a nasty hack that we are trying to add since
many devices won't like the fact that we are lying about the size of
our actual L2 header so things like VLAN tag insertion are going to
end up blowing back on us.

Really my preference in the case of ixgbe would have been to let the
hardware update the outer IP header and the inner TCP header and then
do the UDP and inner IP header as the static headers.  That way we
could still theoretically support fragmentation on the outer headers
which last I knew is a very real possibility since the DF bit is not
set on the outer headers for VXLAN I believe.

- Alex