[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOuuhY8WqGu7WeBEXTKS4DSekEua5ivpsz4wTG4CNbZ4FJ3ZNg@mail.gmail.com>
Date: Fri, 27 Jan 2023 17:32:17 -0800
From: Tom Herbert <tom@...anda.io>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Jamal Hadi Salim <hadi@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>,
Jakub Kicinski <kuba@...nel.org>,
Jamal Hadi Salim <jhs@...atatu.com>, netdev@...r.kernel.org,
kernel@...atatu.com, deb.chatterjee@...el.com,
anjali.singhai@...el.com, namrata.limaye@...el.com,
khalidm@...dia.com, pratyush@...anda.io, xiyou.wangcong@...il.com,
davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
vladbu@...dia.com, simon.horman@...igine.com, stefanc@...vell.com,
seong.kim@....com, mattyk@...dia.com, dan.daly@...el.com,
john.andy.fingerhut@...el.com
Subject: Re: [PATCH net-next RFC 00/20] Introducing P4TC
On Fri, Jan 27, 2023 at 4:47 PM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> On Fri, Jan 27, 2023 at 3:06 PM Tom Herbert <tom@...anda.io> wrote:
> >
> > On Fri, Jan 27, 2023 at 2:26 PM <sdf@...gle.com> wrote:
> > >
> > > On 01/27, Jamal Hadi Salim wrote:
> > > > On Fri, Jan 27, 2023 at 1:26 PM Jiri Pirko <jiri@...nulli.us> wrote:
> > > > >
> > > > > Fri, Jan 27, 2023 at 12:30:22AM CET, kuba@...nel.org wrote:
> > > > > >On Tue, 24 Jan 2023 12:03:46 -0500 Jamal Hadi Salim wrote:
> > > > > >> There have been many discussions and meetings since about 2015 in
> > > > regards to
> > > > > >> P4 over TC and now that the market has chosen P4 as the datapath
> > > > specification
> > > > > >> lingua franca
> > > > > >
> > > > > >Which market?
> > > > > >
> > > > > >Barely anyone understands the existing TC offloads. We'd need strong,
> > > > > >and practical reasons to merge this. Speaking with my "have suffered
> > > > > >thru the TC offloads working for a vendor" hat on, not the "junior
> > > > > >maintainer" hat.
> > > > >
> > > > > You talk about offload, yet I don't see any offload code in this RFC.
> > > > > It's pure sw implementation.
> > > > >
> > > > > But speaking about offload, how exactly do you plan to offload this
> > > > > Jamal? AFAIK there is some HW-specific compiler magic needed to generate
> > > > > HW acceptable blob. How exactly do you plan to deliver it to the driver?
> > > > > If HW offload offload is the motivation for this RFC work and we cannot
> > > > > pass the TC in kernel objects to drivers, I fail to see why exactly do
> > > > > you need the SW implementation...
> > >
> > > > Our rule in TC is: _if you want to offload using TC you must have a
> > > > s/w equivalent_.
> > > > We enforced this rule multiple times (as you know).
> > > > P4TC has a sw equivalent to whatever the hardware would do. We are
> > > > pushing that
> > > > first. Regardless, it has value on its own merit:
> > > > I can run P4 equivalent in s/w in a scriptable (as in no compilation
> > > > in the same spirit as u32 and pedit),
> > > > by programming the kernel datapath without changing any kernel code.
> > >
> > > Not to derail too much, but maybe you can clarify the following for me:
> > > In my (in)experience, P4 is usually constrained by the vendor
> > > specific extensions. So how real is that goal where we can have a generic
> > > P4@TC with an option to offload? In my view, the reality (at least
> > > currently) is that there are NIC-specific P4 programs which won't have
> > > a chance of running generically at TC (unless we implement those vendor
> > > extensions).
> > >
> > > And regarding custom parser, someone has to ask that 'what about bpf
> > > question': let's say we have a P4 frontend at TC, can we use bpfilter-like
> > > usermode helper to transparently compile it to bpf (for SW path) instead
> > > inventing yet another packet parser? Wrestling with the verifier won't be
> > > easy here, but I trust it more than this new kParser.
> >
> > Yes, wrestling with the verifier is tricky, however we do have a
> > solution to compile arbitrarily complex parsers into eBFP. We
> > presented this work at Netdev 0x15
> > https://netdevconf.info/0x15/session.html?Replacing-Flow-Dissector-with-PANDA-Parser.
>
> Thanks Tom, I'll check it out. I've yet to go through the netdev recordings :-(
>
> > Of course this has the obvious advantage that we don't have to change
> > the kernel (however, as we talk about in the presentation, this method
> > actually produces a faster more extensible parser than flow dissector,
> > so it's still on my radar to replace flow dissector itself with an
> > eBPF parser :-) )
>
> Since there is already a bpf flow dissector, I'm assuming you're
> talking about replacing the existing C flow dissector with a
> PANDA-based one?
Yes
> I was hoping that at some point, we can have a BPF flow dissector
> program that supports everything the existing C-one does, and maybe we
> can ship this program with the kernel and load it by default.
Yes, we have that. Actually, we can provide a superset to include
things like TCP options which flow dissector doesn't support
> We can
> keep the C-based one for some minimal non-bpf configurations. But idk,
> the benefit is not 100% clear to me; except maybe bpf-based flow
> dissector can be treated as more "secure" due to all verifier
> constraints...
Not just more secure, more robust and extensible. I call flow
dissector the "function we love to hate". On one hand it has proven to
be incredibly useful, on the other hand it's been a major pain to
maintain and isn't remotely extensible. We have seen many problems
over the years, particularly when people have added support for less
common protocols. Collapsing all the protocol layers, ensuring that
the bookkeeping is correct, and trying to maintain some reasonable
level of performance has led to it being spaghetti code (I wrote the
first instantiation of flow dissector for RPS, so I accept my fair
share of blame for the predicament of flow dissector :-) ). The
optimized eBPF code we're generating also qualifies as spaghetti code
(i.e. a whole bunch of loop unrolling, inlining tables, and so on).
The difference is that the front end code in PANDA-C, is well
organized and abstracts out all the bookkeeping so that the programmer
doesn't have to worry about it.
>
> > The value of kParser is that it is not compiled code, but dynamically
> > scriptable. It's much easier to change on the fly and depends on a CLI
> > interface which works well with P4TC. The front end is the same as
> > what we are using for PANDA parser, that is the same parser frontend
> > (in C code or other) can be compiled into XDP/eBPF, kParser CLI, or
> > other targets (this is based on establishing a IR which we talked
> > about in https://myfoobar2022.sched.com/event/1BhCX/high-performance-programmable-parsers
>
> That seems like a technicality? A BPF-based parser can also be driven
> by maps/tables; or, worst case, can be recompiled and replaced on the
> fly without any downtime.
Perhaps. Also, in the spirit of full transparency, kParser is in its
nature interpreted, so we have to expect that it will have lower
performance than an optimized compiled parser.
Tom
>
>
> > Tom
> >
> > >
> > >
> > > > To answer your question in regards to what the interfaces "P4
> > > > speaking" hardware or drivers
> > > > are going to be programmed, there are discussions going on right now:
> > > > There is a strong
> > > > leaning towards devlink for the hardware side loading.... The idea
> > > > from the driver side is to
> > > > reuse the tc ndos.
> > > > We have biweekly meetings which are open. We do have Nvidia folks, but
> > > > would be great if
> > > > we can have you there. Let me find the link and send it to you.
> > > > Do note however, our goal is to get s/w first as per tradition of
> > > > other offloads with TC .
> > >
> > > > cheers,
> > > > jamal
Powered by blists - more mailing lists