[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOuuhY-MmuN6N9qp_TuyFoOEsxFz5oimtkzY5xHt_nxpoiFguQ@mail.gmail.com>
Date: Thu, 23 Nov 2023 11:42:26 -0800
From: Tom Herbert <tom@...anda.io>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Edward Cree <ecree.xilinx@...il.com>, Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>, Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, netdev@...r.kernel.org, deb.chatterjee@...el.com,
anjali.singhai@...el.com, Vipin.Jain@....com, namrata.limaye@...el.com,
mleitner@...hat.com, Mahesh.Shirshyad@....com, tomasz.osinski@...el.com,
xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, vladbu@...dia.com, horms@...nel.org, bpf@...r.kernel.org,
khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com, dan.daly@...el.com,
chris.sommers@...sight.com, john.andy.fingerhut@...el.com
Subject: Re: [PATCH net-next v8 00/15] Introducing P4TC
On Thu, Nov 23, 2023 at 10:53 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 23 Nov 2023 17:53:42 +0000 Edward Cree wrote:
> > The kernel doesn't like to trust offload blobs from a userspace compiler,
> > because it has no way to be sure that what comes out of the compiler
> > matches the rules/tables/whatever it has in the SW datapath.
> > It's also a support nightmare because it's basically like each user
> > compiling their own device firmware.
>
Hi Jakub,
> Practically speaking every high speed NIC runs a huge binary blob of FW.
> First, let's acknowledge that as reality.
>
Yes. But we're also seeing a trend for programmable NICs. It's an
interesting question as to how the kernel can leverage that
programmability for the benefit of the user.
> Second, there is no equivalent for arbitrary packet parsing in the
> kernel proper. Offload means take something form the host and put it
> on the device. If there's nothing in the kernel, we can't consider
> the new functionality an offload.
That's completely true, however I believe that eBPF has expanded our
definition of "what's in the kernel". For instance, we can do
arbitrary parsing in an XDP/eBPF program (in fact, it's still on my
list of things to do to rip out Flow dissector C code and replace it
with eBPF).
(https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf,
https://www.youtube.com/watch?v=zVnmVDSEoXc&list=PLrninrcyMo3L-hsJv23hFyDGRaeBY1EJO)
>
> I understand that "we offload SW functionality" is our general policy,
> but we should remember why this policy is in place, and not
> automatically jump to the conclusion.
>
> > At least normally with device firmware the driver side is talking to
> > something with narrow/fixed semantics and went through upstream
> > review, even if the firmware side is still a black box.
>
> We should be buildings things which are useful and open (as in
> extensible by people "from the street"). With that in mind, to me,
> a more practical approach would be to try to figure out a common
> and rigid FW interface for expressing the parsing graph.
Parse graphs are best represented by declarative representation, not
an imperative one. This is a main reason why I want to replace flow
dissector, a parser written in imperative C code is difficult to
maintain as evident by the myriad of bugs in that code (particularly
when people added support or uncommon protocols). P4 got this part
right, however I don't believe we need to boil the ocean by
programming the kernel in a new language. A better alternative is to
define an IR that contains for this purpose. We do that in Common
Parser Language (CPL) which is a .json schema to describe parse
graphs. With an IR we can compile into arbitrary backends including
P4, eBPF, C, and even custom assembly instructions for parsing
(arbitrary font ends languages are facilitated as well).
(https://netdevconf.info/0x16/papers/11/High%20Performance%20Programmable%20Parsers.pdf)
>
> But that's an interface going from the binary blob to the kernel.
>
> > Just to prove I'm not playing favourites: this is *also* a problem with
> > eBPF offloads like Nanotubes, and I'm not convinced we have a viable
> > solution yet.
>
> BPF offloads are actual offloads. Config/state is in the kernel,
> you need to pop it out to user space, then prove that it's what
> user intended.
Seems like offloading eBPF byte code and running a VM in the offload
device is pretty much considered a non-starter. But, what if we could
offload the _functionality_ of an eBPF program with confidence that
the functionality _exactly_ matches that of the eBPF program running
in the kernel? I believe that could be beneficial.
For instance, we all know that LRO never gained traction. The reason
is because each vendor does it however they want and no one can match
the exact functionality that SW GRO provides. It's not an offload of
kernel SW, so it's not viable. But, suppose we wrote GRO in some
program that could be compiled into eBPF and a device binary. Using
something like that hash technique I described, it seems like we could
properly do a kernel offload of GRO where the offload functionality
matches the software in the kernel.
Tom
Powered by blists - more mailing lists