[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181219115703.51253f51@cakuba.netronome.com>
Date: Wed, 19 Dec 2018 11:57:03 -0800
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Pablo Neira Ayuso <pablo@...filter.org>
Cc: netdev@...r.kernel.org, davem@...emloft.net,
thomas.lendacky@....com, f.fainelli@...il.com,
ariel.elior@...ium.com, michael.chan@...adcom.com,
santosh@...lsio.com, madalin.bucur@....com,
yisen.zhuang@...wei.com, salil.mehta@...wei.com,
jeffrey.t.kirsher@...el.com, tariqt@...lanox.com,
saeedm@...lanox.com, jiri@...lanox.com, idosch@...lanox.com,
peppe.cavallaro@...com, grygorii.strashko@...com, andrew@...n.ch,
vivien.didelot@...oirfairelinux.com, alexandre.torgue@...com,
joabreu@...opsys.com, linux-net-drivers@...arflare.com,
ganeshgr@...lsio.com, ogerlitz@...lanox.com,
Manish.Chopra@...ium.com, marcelo.leitner@...il.com,
mkubecek@...e.cz, venkatkumar.duvvuru@...adcom.com,
julia.lawall@...6.fr, john.fastabend@...il.com, jhs@...atatu.com,
gerlitz.or@...il.com
Subject: Re: [PATCH net-next,v6 00/12] add flow_rule infrastructure
On Tue, 18 Dec 2018 20:57:05 +0100, Pablo Neira Ayuso wrote:
> On Mon, Dec 17, 2018 at 05:39:29PM -0800, Jakub Kicinski wrote:
> > On Fri, 14 Dec 2018 19:11:53 +0100, Pablo Neira Ayuso wrote:
> > > * Do not expose TC software frontend details to the drivers anymore,
> > > such as structure layouts. Hence, if TC needs to be updated to support
> > > a new software feature, no changes to the drivers will be required.
> >
> > This one you may need to clarify. TC already transforms the rules and
> > for the most part provides abstract enough accessor helpers to protect
> > from changes in internal implementation of the SW path.
>
> Drivers can still modify the TC structure content, they can modify
> frontend configurations, make tc dump non-sense values to userspace
> via netlink or they may crash software plane at a later stage. Better
> if we tend to expose configuration plane data through read-only
> infrastructure that is consumed by drivers.
Sounds like inventing a problem to fit the solution :) I don't think
we've even had such a bug in TC offloads.
> > You could make an argument that if one subsystem already offloads
> > something and the IR can express it another subsystem can add the same
> > functionality and drivers don't have to change. But why do we have
> > multiple subsystem offloading the same thing in the first place?..
>
> Having multiple subsystems that allow you to do the same thing is a
> decision that was made long time ago. This is allowing multiple
> approaches to compete inside the kernel and this also allowing users
> to select the subsystem that fits better their requirements.
Perhaps we learnt something new from the few years we spent working on
this stuff? I'm not sure where we stand. I believe I asked you
something like "why do we need this if we already have flower" when you
posted your first nf_flow patches, no? So my position haven't
changed :)
> Anyway, the problem that this patchset addresses _already exists_ with
> ethtool_rxnfc and cls_flower in place, it would be good to fix it now.
That's not the point of contention.
> > > In handcrafted ascii art, the idea is the following:
> > >
> > > . ethtool_rxnfc tc netfilter
> > > | (ioctl) (netlink) (netlink)
> > > | | | | translate native
> > > Frontend | | | | interface representation
> > > | | | | to flow_rule IR
> > > | | | |
> > > . \/ \/ \/
> > > . ----- flow_rule IR ------
> > > | |
> > > Drivers | | parsing of flow_rule IR
> > > | | to populate hardware IR
> > > | \/
> > > . hardware IR (driver)
> > >
> > > This patchset only converts tc and ethtool_rxnfc to use this
> > > infrastructure. Therefore. this patchset comes with no netfilter
> > > changes at this stage.
> >
> > Let's try to differentiate between cls_flower and TC as a subsystem.
>
> Right. cls_flower is the one more capable subset of tc at this stage.
cls_flower fits fixed function HW and ACL use cases, but that's
slightly beside the point, TC has its pipeline, chains, blocks,
templates, qdiscs...
> > I don't really see how you could rename ndo_setup_tc to flow_offload.
>
> We can start by using a single .ndo and use the enum parameter to
> route the offload request to the corresponding driver handler to deal
> with tc details at this stage. So we use the single .ndo as a
> multiplexor. By consolidating parsers for ethtool_rxnfc and
> cls_flower, we are already saving redundant code in this patchset, and
> I can see more codebase in the tree that drivers could simplify.
ndo_setup_tc does not carry flows. It carries information about
changes to the TC pipeline.
> > What level of abstraction are we really going to achieve here? The
> > devices still need to know netfilter is a separate subsystem with its
> > quirks and rules. Say someone adds a PASS rule to flower, and a DROP
> > rule to netfilter. Packet has to be dropped, but the device will most
> > likely PASS because flower's rule will hit first. n-tuple filters
> > don't have pass so the problem isn't obvious from this patch set.
>
> In software, tc ingress comes before netfilter, then ethtool_rxnfc
> comes before tc ingress and netfilter. Once we have a single .ndo,
> performing unified resource management for offloads will be easier,
> ie. store ownership of rules in the driver ACL database, then when
> rebuilding ACL offload, we make sure we place ethtool rules before tc,
> then tc before netfilter. So we achieve the same semantics as in
> software. We cannot solve this problem until we have unified
> infrastructure to address this.
I'm sorry to say your vision of unified resource management is
extremely hazy. These patches are a prep for something for which
there isn't even a clear plan.
There is no way having drivers learn semantics of another subsystem
would make things easier. Expressing the rules in terms of flow
dissector is not the hard part.
Did you consider things like tunnels and bonds? There are a lot of
problems which have been solved for TC offloads, if you create a
separate subsystem offload you'll have to repeat all that work.
> > I don't mean pretending its flower rules just for offload, but rather
> > have a TC classifier module that can take nft rules (in SW path) and
> > which would semantically fit very clearly into the existing TC
> > framework. Mirroring SW behaviour in HW becomes easier, we don't have
> > to teach drivers about another subsystem.
>
> Then, we'll have to rewrite userspace netfilter to run on top of TC.
> Moreover, add support for conntrack, NAT, logging... And after all
> that is done. For netlink users, we'll have to translate netfilter
> netlink message to TC from the kernel, then route this to TC. It will
> take ages to rewrite netfilter on top of tc and there will be semantic
> differences. This patchset already deals with the existing diversity
> in the ecosystem by providing a single unified representation for the
> hardware, no matter what frontend is being used for this goal.
I'm not suggesting we replace netfilter with TC. I'm suggesting we
replace nf_flow_offload table with something that fits into TC.
You're not going to offload entire netfilter. You want to offload
simplistic flows through the nf_flow_table. What I'm saying, is add a
equivalent of that table into TC. Make user space "link" netfilter to
that.
Powered by blists - more mailing lists