netdev - Re: [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53FC909D.8090000@cumulusnetworks.com>
Date:	Tue, 26 Aug 2014 06:50:21 -0700
From:	Roopa Prabhu <roopa@...ulusnetworks.com>
To:	Thomas Graf <tgraf@...g.ch>
CC:	Jamal Hadi Salim <jhs@...atatu.com>,
	John Fastabend <john.fastabend@...il.com>,
	Scott Feldman <sfeldma@...ulusnetworks.com>,
	Jiri Pirko <jiri@...nulli.us>, netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Neil Horman <nhorman@...driver.com>,
	Andy Gospodarek <andy@...yhouse.net>,
	dborkman <dborkman@...hat.com>, ogerlitz <ogerlitz@...lanox.com>,
	jesse@...ira.com, pshelar@...ira.com, azhou@...ira.com,
	ben@...adent.org.uk, stephen@...workplumber.org,
	jeffrey.t.kirsher@...el.com, vyasevic@...hat.com,
	xiyou.wangcong@...il.com, john.r.fastabend@...el.com,
	edumazet@...gle.com, f.fainelli@...il.com, linville@...driver.com,
	dev@...nvswitch.org, jasowang@...hat.com, ebiederm@...ssion.com,
	nicolas.dichtel@...nd.com, ryazanov.s.a@...il.com,
	buytenh@...tstofly.org, aviadr@...lanox.com, nbd@...nwrt.org,
	alexei.starovoitov@...il.com, Neil.Jerram@...aswitch.com,
	ronye@...lanox.com, Shrijeet Mukherjee <shm@...ulusnetworks.com>
Subject: Re: [patch net-next RFC 10/12] openvswitch: add support for datapath
 hardware offload

On 8/25/14, 3:50 PM, Thomas Graf wrote:
> On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote:
>> On 08/25/14 10:17, Thomas Graf wrote:
>>> On 08/25/14 at 09:53am, Jamal Hadi Salim wrote:
>>> fdb_add() *is* flow based. At least in my understanding, the whole
>>> point here is to extend the idea of fdb_add() and make it understand
>>> L2-L4 in a more generic way for the most common protocols.
>>>
>>> The reason fdb_add() is not reused is because it is Netlink specific
>>> and only suitable for User -> HW offload. Kernel -> HW offload is
>>> technically possible but not clean.
>>>
>> I dont think we have a problem handling any of this today.
> Yes we do. It's restricted to L2 and we can't extend it easily
> because it is based on NDA_*. The use of Netlink makes in-kernel
> usage a pain. To me this is the sole reason for not using fdb_add()
> in the first place. It seems absolutely clear though that fdb_add()
> should be removed after the more generic ndo is in place providing
> a superset of what fdb_add() can do today.
>
>> This is where our (shall i say strong) disagreement is.
>> I think you will find it non-trivial to show me how you can
>> actually take the simple L2 bridge and map it to a "flow".
>> Since your starting point is "everything can be represented via a flow
>> and some table" - we are at a crosspath.
> OK, let me do the convertion for you:
>
> NDA_DST		unused
> NDA_LLADDR	sw_flow_key.eth.dst
> NDA_CACHEINFO	unused
> NDA_PROBES	unused
> NDA_VLAN	sw_flow_key.eth.tci
> NDA_PORT	unused
> NDA_VNI		sw_flow_key.tun_key.tun_id
> NDA_IFINDEX	sw_flow_key.phys.in_port
> NDA_MASTER	unused
>
>> The tc filter API seems to be doing just that.
>> You have different types of classifiers - the h/w may not be able
>> to support some classifier types - but that is a capability discovery
>> challenge.
> Agreed but tc is only one out of many possible existing interfaces
> we have. macvtap (given we want to extend beyond L2), routing,
> OVS, bridge and eventually even things like a team device can and
> should make use of offloads.
>
>> I am saying two things:
>> 1) There are a few "fundamental" interfaces; L2 and L3 being some.
>> Add crypto offload and a few i mentioned in  my presentation. We
> Can you share that preso? I was not present.
>
>> know how to do those. example; there is nothing i cant do with
>> the rtmsg that is L3. or the fdb/port/vlan filter for L2.
>> This flow thing should stay out of those.
> Let me remind you about the name of the structure behind all L3
> forwarding decisions:
>
>          struct flowi4 {
> 		[...]
> 	}
>
> Adding a route means adding a flow. Can we please stop the flow
> bashing? The concept of a flow is very generic, well known and already
> very present in the kernel.
>
> The sw_flow_key proposed comes close to flowi4. Some fields are
> different. They can eventually get merged. The strict IPv4/IPv6
> separation is what makes it non obvious and probably why Jiri chose
> the OVS representation. If you say rtmsg is complete then that clearly
> is not the case. In particular VTEP fields, ARP, and TCP flags are
> clearly missing for many uses.
>
> Again, I'm not saying flow is the ultimate answer to everything. It
> is not. But a lot of hardware out there is aware of flows in combination
> with some form of action execution. Non flow based hardware can have
> their own classifier.
>
>> 2) The flow thing should allow a variety of classifiers to be
>> handled. Again capability discovery would take care of differences.
> So you want the flow to represent something that is not a flow. Again,
> this comes back to the conversation in the other email. If this is
> all about having a single ndo I'm sure we can find common grounds on
> that.

 From what i understood (trying to summarize here for my own benefit):
the switchdev api currently under review proposes every switch asic 
offload abstraction as a flow.
It does not mandate this via code, however, there seems to be some 
discussion along those lines.

The switchdev api flow ndo's need to stay for switch asic drivers that 
support flows directly or
possibly want all their hw offload abstraction to be represented by the 
flow abstraction (openvswitch, the rocker dev ). The details of how the 
flow is mapped to hw lies in the corresponding switch driver code.

We think rtnetlink is the api to model switch asic hw tables.
We have a working model (Cumulus) that maps rtnetlink to switch
asic hw tables (via snooping rtnetlink msgs). This can be done by 
extending the switchdev api
with new ndo's for l2 and l3.

Example:
   new switchdev ndo's for fdb_add/fdb_del
   new switchdev ndo's for l3

Now we only need working patches that implement switchdev api ndo ops 
for l2/l3 (this is in the works).

As long as the current patches under review allow the extension of the 
api to cover non-flow based l2/l3 switch asic offloads, we might be good 
(?).

Thanks,
Roopa



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html