[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+=vKEjzUTcEW+yzA4bTL4TKabrzzk=gb=OyfuVGwOyPA@mail.gmail.com>
Date: Mon, 22 Sep 2014 20:43:15 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Tom Herbert <therbert@...gle.com>
Cc: Thomas Graf <tgraf@...g.ch>, Jiri Pirko <jiri@...nulli.us>,
John Fastabend <john.r.fastabend@...el.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Neil Horman <nhorman@...driver.com>,
Andy Gospodarek <andy@...yhouse.net>,
Daniel Borkmann <dborkman@...hat.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Jesse Gross <jesse@...ira.com>,
Pravin Shelar <pshelar@...ira.com>,
Andy Zhou <azhou@...ira.com>,
Ben Hutchings <ben@...adent.org.uk>,
Stephen Hemminger <stephen@...workplumber.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Vladislav Yasevich <vyasevic@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Scott Feldman <sfeldma@...ulusnetworks.com>,
Florian Fainelli <f.fainelli@...il.com>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
John Linville <linville@...driver.com>,
"dev@...nvswitch.org" <dev@...nvswitch.org>,
Jason Wang <jasowang@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Nicolas Dichtel <nicolas.dichtel@...nd.com>,
ryazanov.s.a@...il.com, Lennert Buytenhek <buytenh@...tstofly.org>,
aviadr@...lanox.com, Felix Fietkau <nbd@...nwrt.org>,
Neil Jerram <Neil.Jerram@...aswitch.com>, ronye@...lanox.com,
simon.horman@...ronome.com,
Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [patch net-next v2 8/9] switchdev: introduce Netlink API
On Mon, Sep 22, 2014 at 7:16 PM, Tom Herbert <therbert@...gle.com> wrote:
> On Mon, Sep 22, 2014 at 6:54 PM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
>> On Mon, Sep 22, 2014 at 8:10 AM, Tom Herbert <therbert@...gle.com> wrote:
>>> On Mon, Sep 22, 2014 at 1:13 AM, Thomas Graf <tgraf@...g.ch> wrote:
>>>> On 09/20/14 at 03:50pm, Alexei Starovoitov wrote:
>>>>> I think HW should not be limited by SW abstractions whether
>>>>> these abstractions are called flows, n-tuples, bridge or else.
>>>>> Really looking forward to see "device reporting the headers as
>>>>> header fields (len, offset) and the associated parse graph"
>>>>> as the first step.
>>>>>
>>>>> Another topic that this discussion didn't cover yet is how this
>>>>> all connects to tunnels and what is 'tunnel offloading'.
>>
>>> encapsulation (stuffing a few bytes of header into a packet) is in
>>> itself not nearly an expensive enough operation to warrant offloading
>>> to the NIC. Personally, I wish if NIC vendors are going to focus on
>>
>> On contrary, generic tunneling is most important one to get right
>> when we're talking offloads.
>> Adding encap header is easy to do in hw, but it breaks all other
>> offloads if hw is not generic. Consider gso packet coming from vm.
>> Generic tunnel allows sw to add inner headers, outer headers and
>> setup offload offsets, so that HW does segmentation, checksuming
>> of inner packet, adjusts inner headers and adds final outer encap.
>
> As I pointed out on a previous thread, we already have a sufficiently
> generic interface to allow HW to do encapsulated TSO
> (SKB_GSO_UDP_TUNNEL and SKB_GSO_UDP_TUNNEL_CSUM with the inner
> headers).
SKB_GSO_UDP_TUNNEL_CSUM was the right way
to start splitting overloaded and messy semantics of
UDP_TUNNEL. I'm still not sure whether you've intended
it for both rx and tx, since to support tunnel_csum on rx,
parsing of encap is needed, whereas tx is so much simpler.
Unless you're assuming checksum_complete model for rx...
> If properly implemented, HW can implement a whole bunch of
> UDP encap protocols without knowing how to parse them.
on a tx side... yes, but I cannot see how you can do rx
with inner csum verify without parsing encap.
What do you have in mind ?
> I don't see how
> a switch on the NIC helps this...
correct, just a switch on a nic isn't very useful.
If immediate consumer of the packet is a VM,
then doing switching in the nic after decap doesn't
add much speed, since bridge+router+nat+policy in sw
after decap and csum verify done by hw are fast enough.
But switching in HW becomes useful when VF
is a destination device, since it avoids hw->sw->hw
roundtrip as Thomas was saying.
Also there are x86 network gateways where tunneled
traffic from virtual network is terminated and sent
over internet or to other datacenter. Performance
demands are high, so if tunnel+switch+nat+policy
can be done in off-the-shelf HW it would be great.
>> And this is just tx offload. On rx smart tunnel offload in HW parses
>> encap and goes all the way to inner headers to verify checksums,
>> it also steers based on inner headers.
>> Try mellanox nics with and without vxlan offload to see
>> the difference.
>
> Turn on UDP RSS on the device and I bet you'll see those differences
> go away!
Logically it should, since all inner flows should get
hashed into different outer src_port, but somehow
that didn't work. Need to re-investigate with your
l4_hash stuff.
> Alexei, I believe you said previously said that SW should not dictate
> HW models. I agree with this, but also believe the converse is true--
> HW shouldn't dictate SW model.
completely agree!
> This is really why I'm raising the
> question of what it means to integrate a switch into the host stack.
> If this is something that doesn't require any model change to the
> stack and is just a clever backend for rx-filters or tc, then I'm fine
> with that!
agree as well. I'm not excited about switchdev
abstraction from this given patch, since it looks overly
simplified and not applicable to real silicon, but
discussion about exposing programmable
nics/switches to sw in a generic way is worth having :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists