[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240301173214.3d95e22b@kernel.org>
Date: Fri, 1 Mar 2024 17:32:14 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Tom Herbert <tom@...anda.io>, John Fastabend <john.fastabend@...il.com>,
"Singhai, Anjali" <anjali.singhai@...el.com>, Paolo Abeni
<pabeni@...hat.com>, Linux Kernel Network Developers
<netdev@...r.kernel.org>, "Chatterjee, Deb" <deb.chatterjee@...el.com>,
"Limaye, Namrata" <namrata.limaye@...el.com>, mleitner@...hat.com,
Mahesh.Shirshyad@....com, Vipin.Jain@....com, "Osinski, Tomasz"
<tomasz.osinski@...el.com>, Jiri Pirko <jiri@...nulli.us>, Cong Wang
<xiyou.wangcong@...il.com>, "David S . Miller" <davem@...emloft.net>,
edumazet@...gle.com, Vlad Buslov <vladbu@...dia.com>, horms@...nel.org,
khalidm@...dia.com, Toke Høiland-Jørgensen
<toke@...hat.com>, Daniel Borkmann <daniel@...earbox.net>, Victor Nogueira
<victor@...atatu.com>, "Tammela, Pedro" <pctammela@...atatu.com>, "Daly,
Dan" <dan.daly@...el.com>, andy.fingerhut@...il.com, "Sommers, Chris"
<chris.sommers@...sight.com>, mattyk@...dia.com, bpf@...r.kernel.org
Subject: Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)
On Fri, 1 Mar 2024 12:39:56 -0500 Jamal Hadi Salim wrote:
> On Fri, Mar 1, 2024 at 12:00 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > Pardon my ignorance, but doesn't P4 want to be compiled to a backend
> > > target? How does going through TC make this seamless?
> >
> > +1
>
> I should clarify what i meant by "seamless". It means the same control
> API is used for s/w or h/w. This is a feature of tc, and is not being
> introduced by P4TC. P4 control only deals with Match-action tables -
> just as TC does.
Right, and the compiled P4 pipeline is tacked onto that API.
Loading that presumably implies a pipeline reset. There's
no precedent for loading things into TC resulting a device
datapath reset.
> > My intuition is that for offload the device would be programmed at
> > start-of-day / probe. By loading the compiled P4 from /lib/firmware.
> > Then the _device_ tells the kernel what tables and parser graph it's
> > got.
>
> BTW: I just want to say that these patches are about s/w - not
> offload. Someone asked about offload so as in normal discussions we
> steered in that direction. The hardware piece will require additional
> patchsets which still require discussions. I hope we dont steer off
> too much, otherwise i can start a new thread just to discuss current
> view of the h/w.
>
> Its not the device telling the kernel what it has. Its the other way around.
Yes, I'm describing how I'd have designed it :) If it was the same
as what you've already implemented - why would I be typing it into
an email.. ? :)
> From the P4 program you generate the s/w (the ebpf code and other
> auxillary stuff) and h/w pieces using a compiler.
> You compile ebpf, etc, then load.
That part is fine.
> The current point of discussion is the hw binary is to be "activated"
> through the same tc filter that does the s/w. So one could say:
>
> tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3
> \
> prog type hw filename "simple_l3.o" ... \
> action bpf obj $PARSER.o section p4tc/parser \
> action bpf obj $PROGNAME.o section p4tc/main
>
> And that would through tc driver callbacks signal to the driver to
> find the binary possibly via /lib/firmware
> Some of the original discussion was to use devlink for loading the
> binary - but that went nowhere.
Back to the device reset, unless the load has no impact on inflight
traffic the loading doesn't belong in TC, IMO. Plus you're going to
run into (what IIRC was Jiri's complaint) that you're loading arbitrary
binary blobs, opaque to the kernel.
> Once you have this in place then netlink with tc skip_sw/hw. This is
> what i meant by "seamless"
>
> > Plus, if we're talking about offloads, aren't we getting back into
> > the same controversies we had when merging OvS (not that I was around).
> > The "standalone stack to the side" problem. Some of the tables in the
> > pipeline may be for routing, not ACLs. Should they be fed from the
> > routing stack? How is that integration going to work? The parsing
> > graph feels a bit like global device configuration, not a piece of
> > functionality that should sit under sub-sub-system in the corner.
>
> The current (maybe i should say initial) thought is the P4 program
> does not touch the existing kernel infra such as fdb etc.
It's off to the side thing. Ignoring the fact that *all*, networking
devices already have parsers which would benefit from being accurately
described.
> Of course we can model the kernel datapath using P4 but you wont be
> using "ip route add..." or "bridge fdb...".
> In the future, P4 extern could be used to model existing infra and we
> should be able to use the same tooling. That is a discussion that
> comes on/off (i think it did in the last meeting).
Maybe, IDK. I thought prevailing wisdom, at least for offloads,
is to offload the existing networking stack, and fill in the gaps.
Not build a completely new implementation from scratch, and "integrate
later". Or at least "fill in the gaps" is how I like to think.
I can't quite fit together in my head how this is okay, but OvS
was not allowed to add their offload API. And what's supposed to
be part of TC and what isn't, where you only expect to have one
filter here, and create a whole new object universe inside TC.
But that's just my opinions. The way things work we may wake up one
day and find out that Dave has applied this :)
Powered by blists - more mailing lists