[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+mtBx9eH5Giir_eQP2Wm77b-zoZw_wyhXcAJXHuZO-G=9L8hA@mail.gmail.com>
Date: Mon, 4 Aug 2014 16:13:16 -0700
From: Tom Herbert <therbert@...gle.com>
To: Andy Zhou <azhou@...ira.com>
Cc: Jesse Gross <jesse@...ira.com>, David Miller <davem@...emloft.net>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [net-next 1/5] net: Add rx-vxlan feature bit
On Mon, Aug 4, 2014 at 3:00 PM, Andy Zhou <azhou@...ira.com> wrote:
> Hi, Tom,
>
> Do you envision those configuration can be applied dynamically, as the
> ports are added and removed under programming API control, for
> example, openFlow? If yes, what would be appropriate API/protocol?
>
I think it's more generic to use an out of band mechanism to this to
cover all use cases. For instance, if I was implementing a simple
router there might not even be a kernel socket or even any socket
opened for the port, but I might still want the advanced steering
capabilities for certain ports.
Actually, I'm wondering if this can just be part of rx-filtering, that
already allows specifying rules by l4proto and dst_port and action
could be an indication to device of which special processing to apply.
> Given that we have not reached consensus on the value and approach of
> receive side offloading, I plan to withdraw this patch from this patch
> series so that we
> can concentrate on common UDP tunnel functions, with vxlan and L2tp
> re-factoring so that we minimize code duplication when introducing
> Geneve support.
>
Thanks for considering both vxlan and l2tp in these patches!
> I will send out the next rev this series in a few days.
>
> Thanks,
>
> Andy
>
> On Sat, Aug 2, 2014 at 12:21 PM, Tom Herbert <therbert@...gle.com> wrote:
>> On Sat, Aug 2, 2014 at 11:16 AM, Jesse Gross <jesse@...ira.com> wrote:
>>> On Sat, Aug 2, 2014 at 8:53 AM, Tom Herbert <therbert@...gle.com> wrote:
>>>> On Fri, Aug 1, 2014 at 2:54 PM, Andy Zhou <azhou@...ira.com> wrote:
>>>>> Add a ndo feature bit that controls turning on and off NIC VXLAN
>>>>> receive offload
>>>>>
>>>>> Signed-off-by: Andy Zhou <azhou@...ira.com>
>>>>> ---
>>>>> Documentation/networking/netdev-features.txt | 5 +++++
>>>>> drivers/net/ethernet/emulex/benet/be_main.c | 6 ++++--
>>>>> drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>>>>> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 11 ++++++++---
>>>>> drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 7 +++++--
>>>>> include/linux/netdev_features.h | 4 ++++
>>>>> net/core/ethtool.c | 1 +
>>>>> 7 files changed, 28 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
>>>>> index f310ede..6436768 100644
>>>>> --- a/Documentation/networking/netdev-features.txt
>>>>> +++ b/Documentation/networking/netdev-features.txt
>>>>> @@ -165,3 +165,8 @@ This requests that the NIC receive all possible frames, including errored
>>>>> frames (such as bad FCS, etc). This can be helpful when sniffing a link with
>>>>> bad packets on it. Some NICs may receive more packets if also put into normal
>>>>> PROMISC mode.
>>>>> +
>>>>> +* rx-vxlan
>>>>> +NIC can parse udp tunneled frames. The NIC may make use of the inner frame
>>>>> +information, in addition to the outer UDP frame to verify, offload(RX)
>>>>> +and steer the packet.
>>>>
>>>> Sorry Andy, I still don't like this feature! I am worried that we are
>>>> opening a can of worms in encouraging HW vendors to focus on narrow
>>>> protocol specific features as opposed to general features that can be
>>>> applied to a wider range of protocols.
>>>>
>>>> For UDP tunneling protocols where source port is set to a hash this
>>>> should sufficient for the most common purposes of RSS and ECMP. This
>>>> is precisely why we are interested in UDP encapsulation in the first
>>>> place, existing NICs and switches support it. Any extra value in HW
>>>> parsing VXLAN for steering is marginal (at least not a compelling
>>>> reason for me run out and buy new NICs).
>>>>
>>>> This functionality might have merit for UDP flows where the source
>>>> port cannot be arbitrary, like what would be needed to get the UDP
>>>> flow to work over a stateful NAT. An example of this might be in QUIC.
>>>> In the case the 5-tuple might not be sufficient the inner flow, but a
>>>> device could conceivably parse the packet and use the QUIC connection
>>>> identifier for flow identification and steering. However, if this
>>>> becomes a real HW feature we would need a different interface anyway.
>>>> QUIC is implemented in user space and to the kernel is just another
>>>> UDP socket. The correct interface would probably be to configure the
>>>> capability in ethtool, not through the UDP stack.
>>>
>>> This actually is a patch against ethtool :)
>>>
>> What I mean by using ethtool is that it should provide the generic
>> interface to enable per UDP port HW-specific functionality (or more
>> generally match to some protocol fields filter). This entails sending
>> commands to the device specifying port and key meaningful to the
>> device on how to process packets on that port. This is similar to how
>> we can program the RSS table or hashing algorithms without having the
>> stack needing to do anything or track its tunnels.
>>
>>> I think this is probably getting somewhat ideological or at least use
>>> case-specific. I understand your goal is to be generic where possible
>>> but enough examples have come up in this discussion that it seems like
>>> there shouldn't be much of an argument about whether this is useful at
>>> this point. Things like the cost/benefit of buying a new NIC or how
>>> QUIC would use it in the future seem a little beside the point,
>>> honestly.
>>>
>>> In any case, this specific patch is just about giving people the
>>> ability to control VXLAN offloads, which I think should make you happy
>>> given that it is already present in the stack and on by default. I
>>> think Andy has done his best to take your comments into account to
>>> make this as clean as possible and on the larger point we probably
>>> just need to move on.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists