lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 23 Sep 2014 13:57:08 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:	Thomas Graf <tgraf@...g.ch>, Jiri Pirko <jiri@...nulli.us>,
	John Fastabend <john.r.fastabend@...el.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Neil Horman <nhorman@...driver.com>,
	Andy Gospodarek <andy@...yhouse.net>,
	Daniel Borkmann <dborkman@...hat.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Jesse Gross <jesse@...ira.com>,
	Pravin Shelar <pshelar@...ira.com>,
	Andy Zhou <azhou@...ira.com>,
	Ben Hutchings <ben@...adent.org.uk>,
	Stephen Hemminger <stephen@...workplumber.org>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Vladislav Yasevich <vyasevic@...hat.com>,
	Cong Wang <xiyou.wangcong@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Scott Feldman <sfeldma@...ulusnetworks.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	Roopa Prabhu <roopa@...ulusnetworks.com>,
	John Linville <linville@...driver.com>,
	"dev@...nvswitch.org" <dev@...nvswitch.org>,
	Jason Wang <jasowang@...hat.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Nicolas Dichtel <nicolas.dichtel@...nd.com>,
	ryazanov.s.a@...il.com, Lennert Buytenhek <buytenh@...tstofly.org>,
	aviadr@...lanox.com, Felix Fietkau <nbd@...nwrt.org>,
	Neil Jerram <Neil.Jerram@...aswitch.com>, ronye@...lanox.com,
	simon.horman@...ronome.com,
	Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [patch net-next v2 8/9] switchdev: introduce Netlink API

> SKB_GSO_UDP_TUNNEL_CSUM was the right way
> to start splitting overloaded and messy semantics of
> UDP_TUNNEL. I'm still not sure whether you've intended
> it for both rx and tx, since to support tunnel_csum on rx,
> parsing of encap is needed, whereas tx is so much simpler.
> Unless you're assuming checksum_complete model for rx...
>
>> If properly implemented, HW can implement a whole bunch of
>> UDP encap protocols without knowing how to parse them.
>
> on a tx side... yes, but I cannot see how you can do rx
> with inner csum verify without parsing encap.
> What do you have in mind ?
>
Implement checksum-complete. It does not require a device to parse the
encap, is usable with probably all encapsulation formats being
discussed, and easily supports multiple checksums in a packet. This
will even work with something like L2TP where a device can't do
stateless parsing (pseudo wire encapsulation).

Of the five basic NIC offloads (RX-csum, TX-csum, TSO, LRO, and RSS),
LRO is the one that probably cannot be generalized so that NICs don't
need to parse specific encapsulation protocols. Fortunately, GRO
performance is now very comparable anyway so I tend to think LRO
support is not crucial (the same argument might be made for GSO/TSO I
suppose, but TSO we can mostly generalize). HW support for checksum
offloads and RSS are definitely still very relevant!

>> I don't see how
>> a switch on the NIC helps this...
>
> correct, just a switch on a nic isn't very useful.
>
> If immediate consumer of the packet is a VM,
> then doing switching in the nic after decap doesn't
> add much speed, since bridge+router+nat+policy in sw
> after decap and csum verify done by hw are fast enough.
> But switching in HW becomes useful when VF
> is a destination device, since it avoids hw->sw->hw
> roundtrip as Thomas was saying.
>
> Also there are x86 network gateways where tunneled
> traffic from virtual network is terminated and sent
> over internet or to other datacenter. Performance
> demands are high, so if tunnel+switch+nat+policy
> can be done in off-the-shelf HW it would be great.
>
>>> And this is just tx offload. On rx smart tunnel offload in HW parses
>>> encap and goes all the way to inner headers to verify checksums,
>>> it also steers based on inner headers.
>>> Try mellanox nics with and without vxlan offload to see
>>> the difference.
>>
>> Turn on UDP RSS on the device and I bet you'll see those differences
>> go away!
>
> Logically it should, since all inner flows should get
> hashed into different outer src_port, but somehow
> that didn't work. Need to re-investigate with your
> l4_hash stuff.
>
You may need to enable RSS for UDP. Like "ethtool -N eth0 rx-flow-hash
udp4 sdfn"

>> Alexei, I believe you said previously said that SW should not dictate
>> HW models. I agree with this, but also believe the converse is true--
>> HW shouldn't dictate SW model.
>
> completely agree!
>
>> This is really why I'm raising the
>> question of what it means to integrate a switch into the host stack.
>> If this is something that doesn't require any model change to the
>> stack and is just a clever backend for rx-filters or tc, then I'm fine
>> with that!
>
> agree as well. I'm not excited about switchdev
> abstraction from this given patch, since it looks overly
> simplified and not applicable to real silicon, but
> discussion about exposing programmable
> nics/switches to sw in a generic way is worth having :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ