netdev - Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZeY6r9cm4pdW9WNC@google.com>
Date: Mon, 4 Mar 2024 13:19:08 -0800
From: Stanislav Fomichev <sdf@...gle.com>
To: Tom Herbert <tom@...anda.io>
Cc: Jakub Kicinski <kuba@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>, 
	John Fastabend <john.fastabend@...il.com>, anjali.singhai@...el.com, 
	Paolo Abeni <pabeni@...hat.com>, 
	Linux Kernel Network Developers <netdev@...r.kernel.org>, deb.chatterjee@...el.com, namrata.limaye@...el.com, 
	mleitner@...hat.com, Mahesh.Shirshyad@....com, Vipin.Jain@....com, 
	tomasz.osinski@...el.com, Jiri Pirko <jiri@...nulli.us>, 
	Cong Wang <xiyou.wangcong@...il.com>, davem@...emloft.net, edumazet@...gle.com, 
	Vlad Buslov <vladbu@...dia.com>, horms@...nel.org, khalidm@...dia.com, 
	"Toke Høiland-Jørgensen" <toke@...hat.com>, Daniel Borkmann <daniel@...earbox.net>, 
	Victor Nogueira <victor@...atatu.com>, pctammela@...atatu.com, dan.daly@...el.com, 
	andy.fingerhut@...il.com, chris.sommers@...sight.com, mattyk@...dia.com, 
	bpf@...r.kernel.org
Subject: Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)

On 03/03, Tom Herbert wrote:
> On Sat, Mar 2, 2024 at 7:15 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Fri, 1 Mar 2024 18:20:36 -0800 Tom Herbert wrote:
> > > This is configurability versus programmability. The table driven
> > > approach as input (configurability) might work fine for generic
> > > match-action tables up to the point that tables are expressive enough
> > > to satisfy the requirements. But parsing doesn't fall into the table
> > > driven paradigm: parsers want to be *programmed*. This is why we
> > > removed kParser from this patch set and fell back to eBPF for parsing.
> > > But the problem we quickly hit that eBPF is not offloadable to network
> > > devices, for example when we compile P4 in an eBPF parser we've lost
> > > the declarative representation that parsers in the devices could
> > > consume (they're not CPUs running eBPF).
> > >
> > > I think the key here is what we mean by kernel offload. When we do
> > > kernel offload, is it the kernel implementation or the kernel
> > > functionality that's being offloaded? If it's the latter then we have
> > > a lot more flexibility. What we'd need is a safe and secure way to
> > > synchronize with that offload device that precisely supports the
> > > kernel functionality we'd like to offload. This can be done if both
> > > the kernel bits and programmed offload are derived from the same
> > > source (i.e. tag source code with a sha-1). For example, if someone
> > > writes a parser in P4, we can compile that into both eBPF and a P4
> > > backend using independent tool chains and program download. At
> > > runtime, the kernel can safely offload the functionality of the eBPF
> > > parser to the device if it matches the hash to that reported by the
> > > device
> >
> > Good points. If I understand you correctly you're saying that parsers
> > are more complex than just a basic parsing tree a'la u32.
> 
> Yes. Parsing things like TLVs, GRE flag field, or nested protobufs
> isn't conducive to u32. We also want the advantages of compiler
> optimizations to unroll loops, squash nodes in the parse graph, etc.
> 
> > Then we can take this argument further. P4 has grown to encompass a lot
> > of functionality of quite complex devices. How do we square that with
> > the kernel functionality offload model. If the entire device is modeled,
> > including f.e. TSO, an offload would mean that the user has to write
> > a TSO implementation which they then load into TC? That seems odd.
> >
> > IOW I don't quite know how to square in my head the "total
> > functionality" with being a TC-based "plugin".
> 
> Hi Jakub,
> 
> I believe the solution is to replace kernel code with eBPF in cases
> where we need programmability. This effectively means that we would
> ship eBPF code as part of the kernel. So in the case of TSO, the
> kernel would include a standard implementation in eBPF that could be
> compiled into the kernel by default. The restricted C source code is
> tagged with a hash, so if someone wants to offload TSO they could
> compile the source into their target and retain the hash. At runtime
> it's a matter of querying the driver to see if the device supports the
> TSO program the kernel is running by comparing hash values. Scaling
> this, a device could support a catalogue of programs: TSO, LRO,
> parser, IPtables, etc., If the kernel can match the hash of its eBPF
> code to one reported by the driver then it can assume functionality is
> offloadable. This is an elaboration of "device features", but instead
> of the device telling us they think they support an adequate GRO
> implementation by reporting NETIF_F_GRO, the device would tell the
> kernel that they not only support GRO but they provide identical
> functionality of the kernel GRO (which IMO is the first requirement of
> kernel offload).
> 
> Even before considering hardware offload, I think this approach
> addresses a more fundamental problem to make the kernel programmable.
> Since the code is in eBPF, the kernel can be reprogrammed at runtime
> which could be controlled by TC. This allows local customization of
> kernel features, but also is the simplest way to "patch" the kernel
> with security and bug fixes (nobody is ever excited to do a kernel

[..]

> rebase in their datacenter!). Flow dissector is a prime candidate for
> this, and I am still planning to replace it with an all eBPF program
> (https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf).

So you're suggesting to bundle (and extend)
tools/testing/selftests/bpf/progs/bpf_flow.c? We were thinking along
similar lines here. We load this program manually right now, shipping
and autoloading with the kernel will be easer.