[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+mtBx9kKATy7LUSaD1nXrY=h0LiEVciDGzWnyjqCXt_LVQMyQ@mail.gmail.com>
Date: Tue, 23 Sep 2014 13:57:08 -0700
From: Tom Herbert <therbert@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Thomas Graf <tgraf@...g.ch>, Jiri Pirko <jiri@...nulli.us>,
John Fastabend <john.r.fastabend@...el.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Neil Horman <nhorman@...driver.com>,
Andy Gospodarek <andy@...yhouse.net>,
Daniel Borkmann <dborkman@...hat.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Jesse Gross <jesse@...ira.com>,
Pravin Shelar <pshelar@...ira.com>,
Andy Zhou <azhou@...ira.com>,
Ben Hutchings <ben@...adent.org.uk>,
Stephen Hemminger <stephen@...workplumber.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Vladislav Yasevich <vyasevic@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Scott Feldman <sfeldma@...ulusnetworks.com>,
Florian Fainelli <f.fainelli@...il.com>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
John Linville <linville@...driver.com>,
"dev@...nvswitch.org" <dev@...nvswitch.org>,
Jason Wang <jasowang@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Nicolas Dichtel <nicolas.dichtel@...nd.com>,
ryazanov.s.a@...il.com, Lennert Buytenhek <buytenh@...tstofly.org>,
aviadr@...lanox.com, Felix Fietkau <nbd@...nwrt.org>,
Neil Jerram <Neil.Jerram@...aswitch.com>, ronye@...lanox.com,
simon.horman@...ronome.com,
Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [patch net-next v2 8/9] switchdev: introduce Netlink API
> SKB_GSO_UDP_TUNNEL_CSUM was the right way
> to start splitting overloaded and messy semantics of
> UDP_TUNNEL. I'm still not sure whether you've intended
> it for both rx and tx, since to support tunnel_csum on rx,
> parsing of encap is needed, whereas tx is so much simpler.
> Unless you're assuming checksum_complete model for rx...
>
>> If properly implemented, HW can implement a whole bunch of
>> UDP encap protocols without knowing how to parse them.
>
> on a tx side... yes, but I cannot see how you can do rx
> with inner csum verify without parsing encap.
> What do you have in mind ?
>
Implement checksum-complete. It does not require a device to parse the
encap, is usable with probably all encapsulation formats being
discussed, and easily supports multiple checksums in a packet. This
will even work with something like L2TP where a device can't do
stateless parsing (pseudo wire encapsulation).
Of the five basic NIC offloads (RX-csum, TX-csum, TSO, LRO, and RSS),
LRO is the one that probably cannot be generalized so that NICs don't
need to parse specific encapsulation protocols. Fortunately, GRO
performance is now very comparable anyway so I tend to think LRO
support is not crucial (the same argument might be made for GSO/TSO I
suppose, but TSO we can mostly generalize). HW support for checksum
offloads and RSS are definitely still very relevant!
>> I don't see how
>> a switch on the NIC helps this...
>
> correct, just a switch on a nic isn't very useful.
>
> If immediate consumer of the packet is a VM,
> then doing switching in the nic after decap doesn't
> add much speed, since bridge+router+nat+policy in sw
> after decap and csum verify done by hw are fast enough.
> But switching in HW becomes useful when VF
> is a destination device, since it avoids hw->sw->hw
> roundtrip as Thomas was saying.
>
> Also there are x86 network gateways where tunneled
> traffic from virtual network is terminated and sent
> over internet or to other datacenter. Performance
> demands are high, so if tunnel+switch+nat+policy
> can be done in off-the-shelf HW it would be great.
>
>>> And this is just tx offload. On rx smart tunnel offload in HW parses
>>> encap and goes all the way to inner headers to verify checksums,
>>> it also steers based on inner headers.
>>> Try mellanox nics with and without vxlan offload to see
>>> the difference.
>>
>> Turn on UDP RSS on the device and I bet you'll see those differences
>> go away!
>
> Logically it should, since all inner flows should get
> hashed into different outer src_port, but somehow
> that didn't work. Need to re-investigate with your
> l4_hash stuff.
>
You may need to enable RSS for UDP. Like "ethtool -N eth0 rx-flow-hash
udp4 sdfn"
>> Alexei, I believe you said previously said that SW should not dictate
>> HW models. I agree with this, but also believe the converse is true--
>> HW shouldn't dictate SW model.
>
> completely agree!
>
>> This is really why I'm raising the
>> question of what it means to integrate a switch into the host stack.
>> If this is something that doesn't require any model change to the
>> stack and is just a clever backend for rx-filters or tc, then I'm fine
>> with that!
>
> agree as well. I'm not excited about switchdev
> abstraction from this given patch, since it looks overly
> simplified and not applicable to real silicon, but
> discussion about exposing programmable
> nics/switches to sw in a generic way is worth having :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists