[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181218195705.r6v7n4gaponpmeak@salvia>
Date: Tue, 18 Dec 2018 20:57:05 +0100
From: Pablo Neira Ayuso <pablo@...filter.org>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net,
thomas.lendacky@....com, f.fainelli@...il.com,
ariel.elior@...ium.com, michael.chan@...adcom.com,
santosh@...lsio.com, madalin.bucur@....com,
yisen.zhuang@...wei.com, salil.mehta@...wei.com,
jeffrey.t.kirsher@...el.com, tariqt@...lanox.com,
saeedm@...lanox.com, jiri@...lanox.com, idosch@...lanox.com,
peppe.cavallaro@...com, grygorii.strashko@...com, andrew@...n.ch,
vivien.didelot@...oirfairelinux.com, alexandre.torgue@...com,
joabreu@...opsys.com, linux-net-drivers@...arflare.com,
ganeshgr@...lsio.com, ogerlitz@...lanox.com,
Manish.Chopra@...ium.com, marcelo.leitner@...il.com,
mkubecek@...e.cz, venkatkumar.duvvuru@...adcom.com,
julia.lawall@...6.fr, john.fastabend@...il.com, jhs@...atatu.com,
gerlitz.or@...il.com
Subject: Re: [PATCH net-next,v6 00/12] add flow_rule infrastructure
Hi Jakub,
On Mon, Dec 17, 2018 at 05:39:29PM -0800, Jakub Kicinski wrote:
> On Fri, 14 Dec 2018 19:11:53 +0100, Pablo Neira Ayuso wrote:
> > Hi,
> >
> > This patchset introduces a kernel intermediate representation (IR) to
> > express ACL hardware offloads, this is heavily based on the existing
> > flow dissector infrastructure and the TC actions. This IR can be used by
> > different frontend ACL interfaces such as ethtool_rxnfc, tc and
> > netfilter to represent ACL hardware offloads.
> >
> > The main goals of this patchset are:
> >
> > * Provide an unified representation that can be passed to the driver
> > for translation to HW IR. This consolidates the code to be maintained
> > in the driver and it also simplifies the development of ACL hardware
> > offloads. Driver developers do not need to add one specific parser for
> > each supported ACL frontend, instead each frontend just generates
> > this flow_rule IR and pass it to drivers to populate the hardware IR.
>
> Ack.
Thanks.
> > * Do not expose TC software frontend details to the drivers anymore,
> > such as structure layouts. Hence, if TC needs to be updated to support
> > a new software feature, no changes to the drivers will be required.
>
> This one you may need to clarify. TC already transforms the rules and
> for the most part provides abstract enough accessor helpers to protect
> from changes in internal implementation of the SW path.
Drivers can still modify the TC structure content, they can modify
frontend configurations, make tc dump non-sense values to userspace
via netlink or they may crash software plane at a later stage. Better
if we tend to expose configuration plane data through read-only
infrastructure that is consumed by drivers.
> You could make an argument that if one subsystem already offloads
> something and the IR can express it another subsystem can add the same
> functionality and drivers don't have to change. But why do we have
> multiple subsystem offloading the same thing in the first place?..
Having multiple subsystems that allow you to do the same thing is a
decision that was made long time ago. This is allowing multiple
approaches to compete inside the kernel and this also allowing users
to select the subsystem that fits better their requirements.
Anyway, the problem that this patchset addresses _already exists_ with
ethtool_rxnfc and cls_flower in place, it would be good to fix it now.
> > In handcrafted ascii art, the idea is the following:
> >
> > . ethtool_rxnfc tc netfilter
> > | (ioctl) (netlink) (netlink)
> > | | | | translate native
> > Frontend | | | | interface representation
> > | | | | to flow_rule IR
> > | | | |
> > . \/ \/ \/
> > . ----- flow_rule IR ------
> > | |
> > Drivers | | parsing of flow_rule IR
> > | | to populate hardware IR
> > | \/
> > . hardware IR (driver)
> >
> > This patchset only converts tc and ethtool_rxnfc to use this
> > infrastructure. Therefore. this patchset comes with no netfilter
> > changes at this stage.
>
> Let's try to differentiate between cls_flower and TC as a subsystem.
Right. cls_flower is the one more capable subset of tc at this stage.
> [...]
>
> > P.S: Moving forward, my rough plan is to propose and discuss the
> > following changes to use this infrastructure from netfilter for ACL
> > hardware offload:
> >
> > * Rename tc_setup ndo to flow_offload (or similar), so it can be used
> > from netfilter too. Otherwise, I'm open for alternatives.
>
> That is the part which really makes me uneasy. Ordering rules, tables
> and offloads in HW is already a hard question, which keeps coming back
> (see Jesse's talk from LPC for example). (And if you think you know
> the answer you probably forgot about legacy SR-IOV :))
>
> The TC offloads themselves today are still quite flaky. [...]
I'm very optimistic in what I'm seeing, patch 1/12 is rather large
because we already have a _fair good amount of drivers_ using the
existing upstream infrastructure.
I don't see anything unfixable from the existing approach that we can
incrementally extend what is already upstream to solve the existing
limitations.
[...]
> I don't really see how you could rename ndo_setup_tc to flow_offload.
We can start by using a single .ndo and use the enum parameter to
route the offload request to the corresponding driver handler to deal
with tc details at this stage. So we use the single .ndo as a
multiplexor. By consolidating parsers for ethtool_rxnfc and
cls_flower, we are already saving redundant code in this patchset, and
I can see more codebase in the tree that drivers could simplify.
> What level of abstraction are we really going to achieve here? The
> devices still need to know netfilter is a separate subsystem with its
> quirks and rules. Say someone adds a PASS rule to flower, and a DROP
> rule to netfilter. Packet has to be dropped, but the device will most
> likely PASS because flower's rule will hit first. n-tuple filters
> don't have pass so the problem isn't obvious from this patch set.
In software, tc ingress comes before netfilter, then ethtool_rxnfc
comes before tc ingress and netfilter. Once we have a single .ndo,
performing unified resource management for offloads will be easier,
ie. store ownership of rules in the driver ACL database, then when
rebuilding ACL offload, we make sure we place ethtool rules before tc,
then tc before netfilter. So we achieve the same semantics as in
software. We cannot solve this problem until we have unified
infrastructure to address this.
[...]
> I don't mean pretending its flower rules just for offload, but rather
> have a TC classifier module that can take nft rules (in SW path) and
> which would semantically fit very clearly into the existing TC
> framework. Mirroring SW behaviour in HW becomes easier, we don't have
> to teach drivers about another subsystem.
Then, we'll have to rewrite userspace netfilter to run on top of TC.
Moreover, add support for conntrack, NAT, logging... And after all
that is done. For netlink users, we'll have to translate netfilter
netlink message to TC from the kernel, then route this to TC. It will
take ages to rewrite netfilter on top of tc and there will be semantic
differences. This patchset already deals with the existing diversity
in the ecosystem by providing a single unified representation for the
hardware, no matter what frontend is being used for this goal.
Cheers,
Pablo
Powered by blists - more mailing lists