[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140825225057.GD30140@casper.infradead.org>
Date: Mon, 25 Aug 2014 23:50:57 +0100
From: Thomas Graf <tgraf@...g.ch>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: John Fastabend <john.fastabend@...il.com>,
Scott Feldman <sfeldma@...ulusnetworks.com>,
Jiri Pirko <jiri@...nulli.us>, netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Neil Horman <nhorman@...driver.com>,
Andy Gospodarek <andy@...yhouse.net>,
dborkman <dborkman@...hat.com>, ogerlitz <ogerlitz@...lanox.com>,
jesse@...ira.com, pshelar@...ira.com, azhou@...ira.com,
ben@...adent.org.uk, stephen@...workplumber.org,
jeffrey.t.kirsher@...el.com, vyasevic@...hat.com,
xiyou.wangcong@...il.com, john.r.fastabend@...el.com,
edumazet@...gle.com, f.fainelli@...il.com,
roopa@...ulusnetworks.com, linville@...driver.com,
dev@...nvswitch.org, jasowang@...hat.com, ebiederm@...ssion.com,
nicolas.dichtel@...nd.com, ryazanov.s.a@...il.com,
buytenh@...tstofly.org, aviadr@...lanox.com, nbd@...nwrt.org,
alexei.starovoitov@...il.com, Neil.Jerram@...aswitch.com,
ronye@...lanox.com
Subject: Re: [patch net-next RFC 10/12] openvswitch: add support for datapath
hardware offload
On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote:
> On 08/25/14 10:17, Thomas Graf wrote:
> >On 08/25/14 at 09:53am, Jamal Hadi Salim wrote:
>
> >fdb_add() *is* flow based. At least in my understanding, the whole
> >point here is to extend the idea of fdb_add() and make it understand
> >L2-L4 in a more generic way for the most common protocols.
> >
> >The reason fdb_add() is not reused is because it is Netlink specific
> >and only suitable for User -> HW offload. Kernel -> HW offload is
> >technically possible but not clean.
> >
>
> I dont think we have a problem handling any of this today.
Yes we do. It's restricted to L2 and we can't extend it easily
because it is based on NDA_*. The use of Netlink makes in-kernel
usage a pain. To me this is the sole reason for not using fdb_add()
in the first place. It seems absolutely clear though that fdb_add()
should be removed after the more generic ndo is in place providing
a superset of what fdb_add() can do today.
> This is where our (shall i say strong) disagreement is.
> I think you will find it non-trivial to show me how you can
> actually take the simple L2 bridge and map it to a "flow".
> Since your starting point is "everything can be represented via a flow
> and some table" - we are at a crosspath.
OK, let me do the convertion for you:
NDA_DST unused
NDA_LLADDR sw_flow_key.eth.dst
NDA_CACHEINFO unused
NDA_PROBES unused
NDA_VLAN sw_flow_key.eth.tci
NDA_PORT unused
NDA_VNI sw_flow_key.tun_key.tun_id
NDA_IFINDEX sw_flow_key.phys.in_port
NDA_MASTER unused
> The tc filter API seems to be doing just that.
> You have different types of classifiers - the h/w may not be able
> to support some classifier types - but that is a capability discovery
> challenge.
Agreed but tc is only one out of many possible existing interfaces
we have. macvtap (given we want to extend beyond L2), routing,
OVS, bridge and eventually even things like a team device can and
should make use of offloads.
> I am saying two things:
> 1) There are a few "fundamental" interfaces; L2 and L3 being some.
> Add crypto offload and a few i mentioned in my presentation. We
Can you share that preso? I was not present.
> know how to do those. example; there is nothing i cant do with
> the rtmsg that is L3. or the fdb/port/vlan filter for L2.
> This flow thing should stay out of those.
Let me remind you about the name of the structure behind all L3
forwarding decisions:
struct flowi4 {
[...]
}
Adding a route means adding a flow. Can we please stop the flow
bashing? The concept of a flow is very generic, well known and already
very present in the kernel.
The sw_flow_key proposed comes close to flowi4. Some fields are
different. They can eventually get merged. The strict IPv4/IPv6
separation is what makes it non obvious and probably why Jiri chose
the OVS representation. If you say rtmsg is complete then that clearly
is not the case. In particular VTEP fields, ARP, and TCP flags are
clearly missing for many uses.
Again, I'm not saying flow is the ultimate answer to everything. It
is not. But a lot of hardware out there is aware of flows in combination
with some form of action execution. Non flow based hardware can have
their own classifier.
> 2) The flow thing should allow a variety of classifiers to be
> handled. Again capability discovery would take care of differences.
So you want the flow to represent something that is not a flow. Again,
this comes back to the conversation in the other email. If this is
all about having a single ndo I'm sure we can find common grounds on
that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists