lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 3 Mar 2024 10:10:52 -0800
From: Tom Herbert <tom@...anda.io>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Jakub Kicinski <kuba@...nel.org>, John Fastabend <john.fastabend@...il.com>, 
	"Singhai, Anjali" <anjali.singhai@...el.com>, Paolo Abeni <pabeni@...hat.com>, 
	Linux Kernel Network Developers <netdev@...r.kernel.org>, "Chatterjee, Deb" <deb.chatterjee@...el.com>, 
	"Limaye, Namrata" <namrata.limaye@...el.com>, Marcelo Ricardo Leitner <mleitner@...hat.com>, 
	"Shirshyad, Mahesh" <Mahesh.Shirshyad@....com>, "Jain, Vipin" <Vipin.Jain@....com>, 
	"Osinski, Tomasz" <tomasz.osinski@...el.com>, Jiri Pirko <jiri@...nulli.us>, 
	Cong Wang <xiyou.wangcong@...il.com>, "David S . Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Vlad Buslov <vladbu@...dia.com>, Simon Horman <horms@...nel.org>, 
	Khalid Manaa <khalidm@...dia.com>, Toke Høiland-Jørgensen <toke@...hat.com>, 
	Daniel Borkmann <daniel@...earbox.net>, Victor Nogueira <victor@...atatu.com>, 
	"Tammela, Pedro" <pctammela@...atatu.com>, "Daly, Dan" <dan.daly@...el.com>, 
	Andy Fingerhut <andy.fingerhut@...il.com>, "Sommers, Chris" <chris.sommers@...sight.com>, 
	Matty Kadosh <mattyk@...dia.com>, bpf <bpf@...r.kernel.org>
Subject: Re: Hardware Offload discussion WAS(Re: [PATCH net-next v12 00/15]
 Introducing P4TC (series 1)

On Sun, Mar 3, 2024 at 9:00 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Sat, Mar 2, 2024 at 10:27 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Sat, 2 Mar 2024 09:36:53 -0500 Jamal Hadi Salim wrote:
> > > 2) Your point on:  "integrate later", or at least "fill in the gaps"
> > > This part i am probably going to mumble on. I am going to consider
> > > more than just doing ACLs/MAT via flower/u32 for the sake of
> > > discussion.
> > > True, "fill the gaps" has been our model so far. It requires kernel
> > > changes, user space code changes etc justifiably so because most of
> > > the time such datapaths are subject to standardization via IETF, IEEE,
> > > etc and new extensions come in on a regular basis.  And sometimes we
> > > do add features that one or two users or a single vendor has need for
> > > at the cost of kernel and user/control extension. Given our work
> > > process, any features added this way take a long time to make it to
> > > the end user.
> >
> > What I had in mind was more of a DDP model. The device loads it binary
> > blob FW in whatever way it does, then it tells the kernel its parser
> > graph, and tables. The kernel exposes those tables to user space.
> > All dynamic, no need to change the kernel for each new protocol.
> >
> > But that's different in two ways:
> >  1. the device tells kernel the tables, no "dynamic reprogramming"
> >  2. you don't need the SW side, the only use of the API is to interact
> >     with the device
> >
> > User can still do BPF kfuncs to look up in the tables (like in FIB),
> > but call them from cls_bpf.
> >
>
> This is not far off from what is envisioned today in the discussions.
> The main issue is who loads the binary? We went from devlink to the
> filter doing the loading. DDP is ethtool. We still need to tie a PCI
> device/tc block to the "program" so we can do skip_sw and it works.
> Meaning a device that is capable of handling multiple programs can
> have multiple blobs loaded. A "program" is mapped to a tc filter and
> MAT control works the same way as it does today (netlink/tc ndo).
>
> A program in P4 has a name, ID and people have been suggesting a sha1
> identity (or a signature of some kind should be generated by the
> compiler). So the upward propagation could be tied to discovering
> these 3 tuples from the driver. Then the control plane targets a
> program via those tuples via netlink (as we do currently).
>
> I do note, using the DDP sample space, currently whatever gets loaded
> is "trusted" and really you need to have human knowledge of what the
> NIC's parsing + MAT is to send the control. With P4 that is all
> visible/programmable by the end user (i am not a proponent of vendors
> "shipping" things or calling them for support) - so should be
> sufficient to just discover what is in the binary and send the correct
> control messages down.
>
> > I think in P4 terms that may be something more akin to only providing
> > the runtime API? I seem to recall they had some distinction...
>
> There are several solutions out there (ex: TDI, P4runtime) - our API
> is netlink and those could be written on top of netlink, there's no
> controversy there.
> So the starting point is defining the datapath using P4, generating
> the binary blob and whatever constraints needed using the vendor
> backend and for s/w equivalent generating the eBPF datapath.
>
> > > At the cost of this sounding controversial, i am going
> > > to call things like fdb, fib, etc which have fixed datapaths in the
> > > kernel "legacy". These "legacy" datapaths almost all the time have
> >
> > The cynic in me sometimes thinks that the biggest problem with "legacy"
> > protocols is that it's hard to make money on them :)
>
> That's a big motivation without a doubt, but also there are people
> that want to experiment with things. One of the craziest examples we
> have is someone who created a P4 program for "in network calculator",
> essentially a calculator in the datapath. You send it two operands and
> an operator using custom headers, it does the math and responds with a
> result in a new header. By itself this program is a toy but it
> demonstrates that if one wanted to, they could have something custom
> in hardware and/or kernel datapath.

Jamal,

Given how long P4 has been around it's surprising that the best
publicly available code example is "the network calculator" toy. At
this point in its lifetime, eBPF had far more examples of real world
use cases publically available. That being said, there's nothing
unique about P4 supporting the network calculator. We could just as
easily write this in eBPF (either plain C or P4)  and "offload" it to
an ARM core on a SmartNIC.

If we are going to support programmable device offload in the Linux
kernel then I maintain it should be a generic mechanism that's
agnostic to *both* the frontend programming language as well as the
backend target. For frontend languages we want to let the user program
in a language that's convenient for *them*, which honestly in most
cases isn't going to be a narrow use case DSL (i.e. typically users
want to code in C/C++, Python, Rust, etc.). For the backend it's the
same story, maybe we're compiling to run in host, maybe we're
offloading to P4 runtime, maybe we're offloading to another CPU, maybe
we're offloading some other programmable NPU. The only real
requirement is a compiler that can take the frontend code and compile
for the desired backend target, but above all we want this to be easy
for the programmer, the compiler needs to do the heavy lifting and we
should never require the user to understand the nuances of a target.

IMO, the model we want for programmable kernel offload is "write once,
run anywhere, run well". Which is the Java tagline amended with "run
well". Users write one program for their datapath processing, it runs
on various targets, for any given target we run to run at the highest
performance levels possible given the target's capabilities.

Tom

>
> cheers,
> jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ