netdev - Re: [PATCH] net: Add ndo_gso

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx-Vz7=2zrVXNA=oZ1FaTuOzP10hbyxzzAgJWEr1GkCGjg@mail.gmail.com>
Date:	Wed, 8 Oct 2014 18:46:53 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	Jesse Gross <jesse@...ira.com>
Cc:	Or Gerlitz <gerlitz.or@...il.com>,
	Alexander Duyck <alexander.h.duyck@...el.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	David Miller <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Thomas Graf <tgraf@...g.ch>,
	Pravin Shelar <pshelar@...ira.com>,
	Andy Zhou <azhou@...ira.com>
Subject: Re: [PATCH] net: Add ndo_gso_check

On Wed, Oct 8, 2014 at 5:30 PM, Jesse Gross <jesse@...ira.com> wrote:
> On Mon, Oct 6, 2014 at 5:17 PM, Tom Herbert <therbert@...gle.com> wrote:
>> On Mon, Oct 6, 2014 at 3:33 PM, Jesse Gross <jesse@...ira.com> wrote:
>>> On Mon, Oct 6, 2014 at 10:59 AM, Tom Herbert <therbert@...gle.com> wrote:
>>>> On Sun, Oct 5, 2014 at 12:13 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>>> On Sun, Oct 5, 2014 at 9:49 PM, Tom Herbert <therbert@...gle.com> wrote:
>>>>>> On Sun, Oct 5, 2014 at 7:04 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>>>>> On Thu, Oct 2, 2014 at 2:06 AM, Tom Herbert <therbert@...gle.com> wrote:
>>>>>>>> On Wed, Oct 1, 2014 at 1:58 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>>>>>>> On Tue, Sep 30, 2014 at 6:34 PM, Tom Herbert <therbert@...gle.com> wrote:
>>>>>>>>>> On Tue, Sep 30, 2014 at 7:30 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>>>>> [...]
>>>>>>>> Solution #4: apply this patch and implement the check functions as
>>>>>>>> needed in those 4 or 5 drivers. If a device can only do VXLAN/NVGRE
>>>>>>>> then I believe the check function is something like:
>>>>>>>>
>>>>>>>> bool mydev_gso_check(struct sk_buff *skb, struct net_device *dev)
>>>>>>>> {
>>>>>>>>         if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
>>>>>>>>             ((skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
>>>>>>>>               skb->protocol != htons(ETH_P_TEB) ||
>>>>>>>>               skb_inner_mac_header(skb) - skb_transport_header(skb) != 12)
>>>>>>>>                 return false;
>>>>>>>>
>>>>>>>>         return true;
>>>>>>>> }
>>>>>>>
>>>>>>> Yep, such helper can can be basically made to work and let the 4-5
>>>>>>> drivers that can
>>>>>>> do GSO offloading for vxlan but not for any FOU/GUE packets signal
>>>>>>> that to the stack.
>>>>>>>
>>>>>>> Re the 12 constant, you were referring to the udp+vxlan headers? it's 8+8
>>>>>>>
>>>>>>> Also, we need a way for drivers that can support VXLAN or NVGRE but
>>>>>>> not concurrently
>>>>>>> on the same port @ the same time to only let vxlan packet to pass
>>>>>>> successfully through the helper.
>>>>>
>>>>>> Or, there should be no difference in GSO processing between VXLAN and
>>>>>> NVGRE. Can you explain why you feel you need to differentiate them for GSO?
>>>>>
>>>>>
>>>>> RX wise, Linux tells the driver that UDP port X would be used for
>>>>> VXLAN, right? and indeed, it's possible for some HW implementations
>>>>> not to support RX offloading (checksum) for both VXLAN and NVGRE @ the
>>>>> same time over the same port. But TX/GRO wise, you're probably
>>>>> correct. The thing is that from the user POV they need solution that
>>>>> works for both RX and TX offloading.
>>>>
>>>> I think from a user POV we want a solution that supports RX and TX
>>>> offloading across the widest range of protocols. This is accomplished
>>>> by implementing protocol agnostic mechanisms like CHECKSUM_COMPLETE
>>>> and protocol agnostic UDP tunnel TSO like we've described. IMO, the
>>>> fact that we have devices that implement protocol specific mechanisms
>>>> for NVGRE and VXLAN should be considered legacy support in the stack,
>>>> for new UDP encapsulation protocols we should not expose specifics in
>>>> the stack in either by adding a GSO type for each protocol, nor
>>>> ndo_add_foo_port for each protocol-- these things will not scale and
>>>> unnecessarily complicate the core stack.
>>>
>>> It's not clear to me that allowing devices to know what protocols are
>>> running on what ports actually complicates the stack. The part that is
>>> complicated is usually the types of operations that are being
>>> offloaded (checksum, TSO, etc.). In all of these tunnel cases, the
>>> operations are same and if you have a clean registration mechanism
>>> then nothing in the core has to see this - only the protocol doing the
>>> registering and the driver that is supporting it.
>>>
>>
>> We already have an ntuple filtering interface that allows configuring
>> a device for special processing of RX packets. I don't see why that
>> shouldn't apply to the use case protocol processing for specific ports
>> in the encapsulation use case.
>
> You mentioned this before but I guess I don't really understand it. I
> suppose it is possible to express the port number and encapsulation as
> a filter but it doesn't really seem all that natural and at the end of
> the day it won't be mapped to a filter in the NIC. Can you explain it
> some more?
>
With n-tuple filters you should be able to configure a rule to match
packets (say by port) and assign an "action" which is understood by
the driver (say assume packets are VXLAN and return
CHECKSUM_UNNECESSARY). The interface is to the driver, so how this is
actually instantiated on the device is a private matter.

This is far more flexible and extensible model than trying to have the
stack do port->protocol registration. We can filter on much than just
destination port (this is actually a problem in UDP offloads which
only works with binding socket to INADDR_ANY). Also, this model can be
applied to many different scenarios not just encapsulation or those
protocols that are implemented by the kernel. I imagine someone will
want to do QUIC acceleration/steering in a device at some point, this
work even if the kernel doesn't implement any part of the protocol (a
design point of QUIC ;-) ). It would be interesting to see how the
super charged programmable protocol parsers Alexei described might be
integrated with RX filtering.

>>> I have no disagreement with trying to be generic across protocols. I'm
>>> just not convinced that it is a realistic plan. It's obvious that it
>>> is not doable today nor will be it be in the next generation of NICs
>>> (which are guaranteed to add support for new protocols). Furthermore,
>>> there will be more advanced stuff coming in the future that I think
>>> will be difficult or impossible to make protocol agnostic. Rather than
>>> pretending that this doesn't exist or will never happen, it's better
>>> focus on how to integrating it cleanly.
>>
>> Sorry, but I don't understand how supporting a new protocols in a
>> device for the purposes of returning CHECKSUM_UNNECESSARY is better or
>> easier to implement than just returning CHECKSUM_COMPLETE. Same thing
>> for trying to use NETIF_F_IP_CSUM with encapsulation rather than
>> NETIF_F_HW_CSUM. I'm not a hardware guy, so it's possible I'm missing
>> something obvious...
>>
>> Can you be more specific about this "advanced stuff"?
>
> I think checksums are really the exception, not the rule. It's great
> that they have this nice property of being additive and we should use
> that where we can but that doesn't apply to other types of operation
> (or even other types of checksums). Encryption or CRC32 carried inside
> the tunnel header can't be accelerated without some additional
> knowledge of the protocol. I think there were also a few other things
> that came up along these lines when we talked about this in an earlier
> thread - that's what I mean by "advanced stuff".
>
> For the basic one's complement checksums, I have no objection to
> CHECKSUM_COMPLETE. However, the reality is that this is not generally
> implemented today and that is unlikely to change for a few years even
> in the best case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html