[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=NEB25naGtz=YaOt6BDoiv4RpDw27Y=btMZAMGeYB5bg@mail.gmail.com>
Date: Fri, 1 Mar 2024 21:59:20 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Tom Herbert <tom@...anda.io>, John Fastabend <john.fastabend@...il.com>,
"Singhai, Anjali" <anjali.singhai@...el.com>, Paolo Abeni <pabeni@...hat.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>, "Chatterjee, Deb" <deb.chatterjee@...el.com>,
"Limaye, Namrata" <namrata.limaye@...el.com>, Marcelo Ricardo Leitner <mleitner@...hat.com>,
"Shirshyad, Mahesh" <Mahesh.Shirshyad@....com>, "Jain, Vipin" <Vipin.Jain@....com>,
"Osinski, Tomasz" <tomasz.osinski@...el.com>, Jiri Pirko <jiri@...nulli.us>,
Cong Wang <xiyou.wangcong@...il.com>, "David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Vlad Buslov <vladbu@...dia.com>, Simon Horman <horms@...nel.org>,
Khalid Manaa <khalidm@...dia.com>, Toke Høiland-Jørgensen <toke@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>, Victor Nogueira <victor@...atatu.com>,
"Tammela, Pedro" <pctammela@...atatu.com>, "Daly, Dan" <dan.daly@...el.com>,
Andy Fingerhut <andy.fingerhut@...il.com>, "Sommers, Chris" <chris.sommers@...sight.com>,
Matty Kadosh <mattyk@...dia.com>, bpf <bpf@...r.kernel.org>
Subject: Hardware Offload discussion WAS(Re: [PATCH net-next v12 00/15]
Introducing P4TC (series 1)
On Fri, Mar 1, 2024 at 8:32 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Fri, 1 Mar 2024 12:39:56 -0500 Jamal Hadi Salim wrote:
> > On Fri, Mar 1, 2024 at 12:00 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > Pardon my ignorance, but doesn't P4 want to be compiled to a backend
> > > > target? How does going through TC make this seamless?
> > >
> > > +1
> >
> > I should clarify what i meant by "seamless". It means the same control
> > API is used for s/w or h/w. This is a feature of tc, and is not being
> > introduced by P4TC. P4 control only deals with Match-action tables -
> > just as TC does.
>
> Right, and the compiled P4 pipeline is tacked onto that API.
> Loading that presumably implies a pipeline reset. There's
> no precedent for loading things into TC resulting a device
> datapath reset.
Ive changed the subject to reflect this discussion is about h/w
offload so we dont drift too much from the intent of the patches.
AFAIK, all these devices have some HA built in to do program
replacement. i.e. afaik, no device reset.
I believe the tofino switch in the earlier generations may have needed
resets which caused a few packet drops in a live environment update.
Granted there may be devices (not that i am aware) that may not be
able to do HA. All this needs to be considered for offloads.
> > > My intuition is that for offload the device would be programmed at
> > > start-of-day / probe. By loading the compiled P4 from /lib/firmware.
> > > Then the _device_ tells the kernel what tables and parser graph it's
> > > got.
> >
> > BTW: I just want to say that these patches are about s/w - not
> > offload. Someone asked about offload so as in normal discussions we
> > steered in that direction. The hardware piece will require additional
> > patchsets which still require discussions. I hope we dont steer off
> > too much, otherwise i can start a new thread just to discuss current
> > view of the h/w.
> >
> > Its not the device telling the kernel what it has. Its the other way around.
>
> Yes, I'm describing how I'd have designed it :) If it was the same
> as what you've already implemented - why would I be typing it into
> an email.. ? :)
>
I think i misunderstood you and thought I needed to provide context.
The P4 pipelines are meant to be able to be re-programmed multiple
times in a live environment. IOW, I should be able to delete/create a
pipeline while another is running. Some hardware may require that the
parser is shared etc, but you can certainly replace the match action
tables or add an entirely new logic. In any case this is all still
under discussion and can be further refined.
> > From the P4 program you generate the s/w (the ebpf code and other
> > auxillary stuff) and h/w pieces using a compiler.
> > You compile ebpf, etc, then load.
>
> That part is fine.
>
> > The current point of discussion is the hw binary is to be "activated"
> > through the same tc filter that does the s/w. So one could say:
> >
> > tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3
> > \
> > prog type hw filename "simple_l3.o" ... \
> > action bpf obj $PARSER.o section p4tc/parser \
> > action bpf obj $PROGNAME.o section p4tc/main
> >
> > And that would through tc driver callbacks signal to the driver to
> > find the binary possibly via /lib/firmware
> > Some of the original discussion was to use devlink for loading the
> > binary - but that went nowhere.
>
> Back to the device reset, unless the load has no impact on inflight
> traffic the loading doesn't belong in TC, IMO. Plus you're going to
> run into (what IIRC was Jiri's complaint) that you're loading arbitrary
> binary blobs, opaque to the kernel.
>
And you said at that time binary blobs are already a way of life.
Let's take DDP as a use case: They load the firmware (via ethtool)
and we were recently discussing whether they should use flower or u32
etc. I would say this is in the same spirit. Doing ethtool may be a
bit disconnected. But that is up for discussion as well.
There has been concern that we need to have some authentication in
some of the discussions. Is that what you mean?
> > Once you have this in place then netlink with tc skip_sw/hw. This is
> > what i meant by "seamless"
> >
> > > Plus, if we're talking about offloads, aren't we getting back into
> > > the same controversies we had when merging OvS (not that I was around).
> > > The "standalone stack to the side" problem. Some of the tables in the
> > > pipeline may be for routing, not ACLs. Should they be fed from the
> > > routing stack? How is that integration going to work? The parsing
> > > graph feels a bit like global device configuration, not a piece of
> > > functionality that should sit under sub-sub-system in the corner.
> >
> > The current (maybe i should say initial) thought is the P4 program
> > does not touch the existing kernel infra such as fdb etc.
>
> It's off to the side thing. Ignoring the fact that *all*, networking
> devices already have parsers which would benefit from being accurately
> described.
>
I am not following this point.
> > Of course we can model the kernel datapath using P4 but you wont be
> > using "ip route add..." or "bridge fdb...".
> > In the future, P4 extern could be used to model existing infra and we
> > should be able to use the same tooling. That is a discussion that
> > comes on/off (i think it did in the last meeting).
>
> Maybe, IDK. I thought prevailing wisdom, at least for offloads,
> is to offload the existing networking stack, and fill in the gaps.
> Not build a completely new implementation from scratch, and "integrate
> later". Or at least "fill in the gaps" is how I like to think.
>
> I can't quite fit together in my head how this is okay, but OvS
> was not allowed to add their offload API. And what's supposed to
> be part of TC and what isn't, where you only expect to have one
> filter here, and create a whole new object universe inside TC.
>
I was there.
Ovs matched what tc already had functionally, 10 years after tc
existed, and they were busy rewriting what tc offered. So naturally we
pushed for them to use what TC had. You still need to write whatever
extensions needed into the kernel etc in order to support what the
hardware can offer.
I hope i am not stating the obvious: P4 provides a more malleable
approach. Assume a blank template in h/w and s/w and where you specify
what you need then both the s/w and hardware support it. Flower is
analogous to a "fixed pipeline" meaning you can extend flower by
changing the kernel and datapath. Often it is not covering all
potential hw match actions engines and often we see patches to do one
more thing requiring more kernel changes. If you replace flower with
P4 you remove the need to update the kernel, user space etc for the
same features that flower needs to be extended for today. You just
tell the compiler what you need (within hardware capacity of course).
So i dont see P4 as "offload the existing kernel infra aka flower" but
rather remove the limitations that flower constrains us with today. As
far as other kernel infra (fdb etc), that can be added as i stated -
it is just not a starting point.
cheers,
jamal
> But that's just my opinions. The way things work we may wake up one
> day and find out that Dave has applied this :)
Powered by blists - more mailing lists