[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTScBO-h6iM47-NbYSDDt6LX7pUXD82_KANDcjp7Y=99jzg@mail.gmail.com>
Date: Sat, 28 Jan 2023 10:33:00 -0500
From: Willem de Bruijn <willemb@...gle.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Stanislav Fomichev <sdf@...gle.com>,
Jamal Hadi Salim <hadi@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>,
Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
kernel@...atatu.com, deb.chatterjee@...el.com,
anjali.singhai@...el.com, namrata.limaye@...el.com,
khalidm@...dia.com, tom@...anda.io, pratyush@...anda.io,
xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, vladbu@...dia.com, simon.horman@...igine.com,
stefanc@...vell.com, seong.kim@....com, mattyk@...dia.com,
dan.daly@...el.com, john.andy.fingerhut@...el.com
Subject: Re: [PATCH net-next RFC 00/20] Introducing P4TC
On Sat, Jan 28, 2023 at 10:10 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Sat, Jan 28, 2023 at 8:37 AM Willem de Bruijn <willemb@...gle.com> wrote:
> >
> > On Fri, Jan 27, 2023 at 7:48 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > >
> > > On Fri, Jan 27, 2023 at 3:27 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > > >
> > > > On Fri, Jan 27, 2023 at 5:26 PM <sdf@...gle.com> wrote:
> > > > >
> > > > > On 01/27, Jamal Hadi Salim wrote:
> > > > > > On Fri, Jan 27, 2023 at 1:26 PM Jiri Pirko <jiri@...nulli.us> wrote:
> > > > > > >
> > > > > > > Fri, Jan 27, 2023 at 12:30:22AM CET, kuba@...nel.org wrote:
> > > > > > > >On Tue, 24 Jan 2023 12:03:46 -0500 Jamal Hadi Salim wrote:
> > > > > > > >> There have been many discussions and meetings since about 2015 in
> > > > > > regards to
> > > > > > > >> P4 over TC and now that the market has chosen P4 as the datapath
> > > > > > specification
> > > > > > > >> lingua franca
> > > > > > > >
> > > > > > > >Which market?
> > > > > > > >
> > > > > > > >Barely anyone understands the existing TC offloads. We'd need strong,
> > > > > > > >and practical reasons to merge this. Speaking with my "have suffered
> > > > > > > >thru the TC offloads working for a vendor" hat on, not the "junior
> > > > > > > >maintainer" hat.
> > > > > > >
> > > > > > > You talk about offload, yet I don't see any offload code in this RFC.
> > > > > > > It's pure sw implementation.
> > > > > > >
> > > > > > > But speaking about offload, how exactly do you plan to offload this
> > > > > > > Jamal? AFAIK there is some HW-specific compiler magic needed to generate
> > > > > > > HW acceptable blob. How exactly do you plan to deliver it to the driver?
> > > > > > > If HW offload offload is the motivation for this RFC work and we cannot
> > > > > > > pass the TC in kernel objects to drivers, I fail to see why exactly do
> > > > > > > you need the SW implementation...
> > > > >
> > > > > > Our rule in TC is: _if you want to offload using TC you must have a
> > > > > > s/w equivalent_.
> > > > > > We enforced this rule multiple times (as you know).
> > > > > > P4TC has a sw equivalent to whatever the hardware would do. We are
> > > > > > pushing that
> > > > > > first. Regardless, it has value on its own merit:
> > > > > > I can run P4 equivalent in s/w in a scriptable (as in no compilation
> > > > > > in the same spirit as u32 and pedit),
> > > > > > by programming the kernel datapath without changing any kernel code.
> > > > >
> > > > > Not to derail too much, but maybe you can clarify the following for me:
> > > > > In my (in)experience, P4 is usually constrained by the vendor
> > > > > specific extensions. So how real is that goal where we can have a generic
> > > > > P4@TC with an option to offload? In my view, the reality (at least
> > > > > currently) is that there are NIC-specific P4 programs which won't have
> > > > > a chance of running generically at TC (unless we implement those vendor
> > > > > extensions).
> > > >
> > > > We are going to implement all the PSA/PNA externs. Most of these
> > > > programs tend to
> > > > be set or ALU operations on headers or metadata which we can handle.
> > > > Do you have
> > > > any examples of NIC-vendor-specific features that cant be generalized?
> > >
> > > I don't think I can share more without giving away something that I
> > > shouldn't give away :-)
> > > But IIUC, and I might be missing something, it's totally within the
> > > standard for vendors to differentiate and provide non-standard
> > > 'extern' extensions.
> > > I'm mostly wondering what are your thoughts on this. If I have a p4
> > > program depending on one of these externs, we can't sw-emulate it
> > > unless we also implement the extension. Are we gonna ask NICs that
> > > have those custom extensions to provide a SW implementation as well?
> > > Or are we going to prohibit vendors to differentiate that way?
> > >
> > > > > And regarding custom parser, someone has to ask that 'what about bpf
> > > > > question': let's say we have a P4 frontend at TC, can we use bpfilter-like
> > > > > usermode helper to transparently compile it to bpf (for SW path) instead
> > > > > inventing yet another packet parser? Wrestling with the verifier won't be
> > > > > easy here, but I trust it more than this new kParser.
> > > > >
> > > >
> > > > We dont compile anything, the parser (and rest of infra) is scriptable.
> > >
> > > As I've replied to Tom, that seems like a technicality. BPF programs
> > > can also be scriptable with some maps/tables. Or it can be made to
> > > look like "scriptable" by recompiling it on every configuration change
> > > and updating it on the fly. Or am I missing something?
> > >
> > > Can we have a P4TC frontend and whenever configuration is updated, we
> > > upcall into userspace to compile this whatever p4 representation into
> > > whatever bpf bytecode that we then run. No new/custom/scriptable
> > > parsers needed.
> >
> > I would also think that if we need another programmable component in
> > the kernel, that this would be based on BPF, and compiled outside the
> > kernel.
> >
> > Is the argument for an explicit TC objects API purely that this API
> > can be passed through to hardware, as well as implemented in the
> > kernel directly? Something that would be lost if the datapath is
> > implement as a single BPF program at the TC hook.
> >
>
> We use the skip_sw and skip_hw knobs in tc to indicate whether a
> policy is targeting hw or sw. Not sure if you are familiar with it but its
> been around (and deployed) for a few years now. So a P4 program
> policy can target either.
I know. So the only reason the kernel ABI needs to be extended with P4
objects is to be able to pass the same commands to hardware. The whole
kernel dataplane could be implemented as a BPF program, correct?
> In regards to the parser - we need a scriptable parser which is offered
> by kparser in kernel. P4 doesnt describe how to offload the parser
> just the matches and actions; however, as Tom alluded there's nothing
> that obstructs us offer the same tc controls to offload the parser or pieces
> of it.
And this is the only reason that the parser needs to be in the kernel.
Because the API is at the kernel ABI level. If the P4 program is compiled
to BPF in userspace, then the parser would be compiled in userspace
too. A preferable option, as it would not require adding yet another
parser in C in the kernel.
I understand the value of PANDA as a high level declarative language
to describe network protocols. I'm just trying to get more explicit
why compilation from PANDA to BPF is not sufficient for your use-case.
> cheers,
> jamal
>
> > Can you elaborate some more why this needs yet another in-kernel
> > parser separate from BPF? The flow dissection case is solved fine by
> > the BPF flow dissector. (I also hope one day the kernel can load a BPF
> > dissector by default and we avoid the majority of the unsafe C code
> > entirely.)
Powered by blists - more mailing lists