[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx-mJE6SW2bqf3_t6iu=o5FY1WAF1ByeZpDJnc+6fpK6nA@mail.gmail.com>
Date: Mon, 22 Sep 2014 19:16:47 -0700
From: Tom Herbert <therbert@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Thomas Graf <tgraf@...g.ch>, Jiri Pirko <jiri@...nulli.us>,
John Fastabend <john.r.fastabend@...el.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Neil Horman <nhorman@...driver.com>,
Andy Gospodarek <andy@...yhouse.net>,
Daniel Borkmann <dborkman@...hat.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Jesse Gross <jesse@...ira.com>,
Pravin Shelar <pshelar@...ira.com>,
Andy Zhou <azhou@...ira.com>,
Ben Hutchings <ben@...adent.org.uk>,
Stephen Hemminger <stephen@...workplumber.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Vladislav Yasevich <vyasevic@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Scott Feldman <sfeldma@...ulusnetworks.com>,
Florian Fainelli <f.fainelli@...il.com>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
John Linville <linville@...driver.com>,
"dev@...nvswitch.org" <dev@...nvswitch.org>,
Jason Wang <jasowang@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Nicolas Dichtel <nicolas.dichtel@...nd.com>,
ryazanov.s.a@...il.com, Lennert Buytenhek <buytenh@...tstofly.org>,
aviadr@...lanox.com, Felix Fietkau <nbd@...nwrt.org>,
Neil Jerram <Neil.Jerram@...aswitch.com>, ronye@...lanox.com,
simon.horman@...ronome.com,
Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [patch net-next v2 8/9] switchdev: introduce Netlink API
On Mon, Sep 22, 2014 at 6:54 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Mon, Sep 22, 2014 at 8:10 AM, Tom Herbert <therbert@...gle.com> wrote:
>> On Mon, Sep 22, 2014 at 1:13 AM, Thomas Graf <tgraf@...g.ch> wrote:
>>> On 09/20/14 at 03:50pm, Alexei Starovoitov wrote:
>>>> I think HW should not be limited by SW abstractions whether
>>>> these abstractions are called flows, n-tuples, bridge or else.
>>>> Really looking forward to see "device reporting the headers as
>>>> header fields (len, offset) and the associated parse graph"
>>>> as the first step.
>>>>
>>>> Another topic that this discussion didn't cover yet is how this
>>>> all connects to tunnels and what is 'tunnel offloading'.
>
>> encapsulation (stuffing a few bytes of header into a packet) is in
>> itself not nearly an expensive enough operation to warrant offloading
>> to the NIC. Personally, I wish if NIC vendors are going to focus on
>
> On contrary, generic tunneling is most important one to get right
> when we're talking offloads.
> Adding encap header is easy to do in hw, but it breaks all other
> offloads if hw is not generic. Consider gso packet coming from vm.
> Generic tunnel allows sw to add inner headers, outer headers and
> setup offload offsets, so that HW does segmentation, checksuming
> of inner packet, adjusts inner headers and adds final outer encap.
As I pointed out on a previous thread, we already have a sufficiently
generic interface to allow HW to do encapsulated TSO
(SKB_GSO_UDP_TUNNEL and SKB_GSO_UDP_TUNNEL_CSUM with the inner
headers). If properly implemented, HW can implement a whole bunch of
UDP encap protocols without knowing how to parse them. I don't see how
a switch on the NIC helps this...
> And this is just tx offload. On rx smart tunnel offload in HW parses
> encap and goes all the way to inner headers to verify checksums,
> it also steers based on inner headers.
> Try mellanox nics with and without vxlan offload to see
> the difference.
Turn on UDP RSS on the device and I bet you'll see those differences
go away! Once we moved to UDP encapsulation, there's really little
value in looking at inner headers for RSS or ECMP, this should be
sufficient. Sure someone might want to parse the inner headers for
some sort of advanced RX steering, but again this implies rx-filtering
and not switch functionality.
Alexei, I believe you said previously said that SW should not dictate
HW models. I agree with this, but also believe the converse is true--
HW shouldn't dictate SW model. This is really why I'm raising the
question of what it means to integrate a switch into the host stack.
If this is something that doesn't require any model change to the
stack and is just a clever backend for rx-filters or tc, then I'm fine
with that!
Thanks,
Tom
> It looks like fm10k will be just as good, but existing encaps are
> not going to last forever, so RX should be improved they way John
> is saying. There gotta to be a 'parse graph' for HW to see past
> variable length encap and into inner headers.
> checksum_complete style of offloading checksum verification
> is not efficient. The cost of adjusting it over and over while
> parsing encaps is too high. Plus cpu steering based on outer
> headers is just too slow when speeds are in 40G range.
>
>> stateful offload I rather see it be for encryption which I believe
>> currently does warrant offload at 40G and higher speeds.
>
> encryption offload is badly needed as well. Unfortunately it's
> not seen as nic feature yet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists