netdev - Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 1 Mar 2024 18:20:36 -0800
From: Tom Herbert <tom@...anda.io>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, John Fastabend <john.fastabend@...il.com>, 
	"Singhai, Anjali" <anjali.singhai@...el.com>, Paolo Abeni <pabeni@...hat.com>, 
	Linux Kernel Network Developers <netdev@...r.kernel.org>, "Chatterjee, Deb" <deb.chatterjee@...el.com>, 
	"Limaye, Namrata" <namrata.limaye@...el.com>, mleitner@...hat.com, Mahesh.Shirshyad@....com, 
	Vipin.Jain@....com, "Osinski, Tomasz" <tomasz.osinski@...el.com>, 
	Jiri Pirko <jiri@...nulli.us>, Cong Wang <xiyou.wangcong@...il.com>, 
	"David S . Miller" <davem@...emloft.net>, edumazet@...gle.com, Vlad Buslov <vladbu@...dia.com>, 
	horms@...nel.org, khalidm@...dia.com, 
	Toke Høiland-Jørgensen <toke@...hat.com>, 
	Daniel Borkmann <daniel@...earbox.net>, Victor Nogueira <victor@...atatu.com>, 
	"Tammela, Pedro" <pctammela@...atatu.com>, "Daly, Dan" <dan.daly@...el.com>, andy.fingerhut@...il.com, 
	"Sommers, Chris" <chris.sommers@...sight.com>, mattyk@...dia.com, bpf@...r.kernel.org
Subject: Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)

On Fri, Mar 1, 2024 at 5:32 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Fri, 1 Mar 2024 12:39:56 -0500 Jamal Hadi Salim wrote:
> > On Fri, Mar 1, 2024 at 12:00 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > Pardon my ignorance, but doesn't P4 want to be compiled to a backend
> > > > target? How does going through TC make this seamless?
> > >
> > > +1
> >
> > I should clarify what i meant by "seamless". It means the same control
> > API is used for s/w or h/w. This is a feature of tc, and is not being
> > introduced by P4TC. P4 control only deals with Match-action tables -
> > just as TC does.
>
> Right, and the compiled P4 pipeline is tacked onto that API.
> Loading that presumably implies a pipeline reset. There's
> no precedent for loading things into TC resulting a device
> datapath reset.
>
> > > My intuition is that for offload the device would be programmed at
> > > start-of-day / probe. By loading the compiled P4 from /lib/firmware.
> > > Then the _device_ tells the kernel what tables and parser graph it's
> > > got.
> >
> > BTW: I just want to say that these patches are about s/w - not
> > offload. Someone asked about offload so as in normal discussions we
> > steered in that direction. The hardware piece will require additional
> > patchsets which still require discussions. I hope we dont steer off
> > too much, otherwise i can start a new thread just to discuss current
> > view of the h/w.
> >
> > Its not the device telling the kernel what it has. Its the other way around.
>
> Yes, I'm describing how I'd have designed it :) If it was the same
> as what you've already implemented - why would I be typing it into
> an email.. ? :)
>
> > From the P4 program you generate the s/w (the ebpf code and other
> > auxillary stuff) and h/w pieces using a compiler.
> > You compile ebpf, etc, then load.
>
> That part is fine.
>
> > The current point of discussion is the hw binary is to be "activated"
> > through the same tc filter that does the s/w. So one could say:
> >
> > tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3
> > \
> >    prog type hw filename "simple_l3.o" ... \
> >    action bpf obj $PARSER.o section p4tc/parser \
> >    action bpf obj $PROGNAME.o section p4tc/main
> >
> > And that would through tc driver callbacks signal to the driver to
> > find the binary possibly via  /lib/firmware
> > Some of the original discussion was to use devlink for loading the
> > binary - but that went nowhere.
>
> Back to the device reset, unless the load has no impact on inflight
> traffic the loading doesn't belong in TC, IMO. Plus you're going to
> run into (what IIRC was Jiri's complaint) that you're loading arbitrary
> binary blobs, opaque to the kernel.
>
> > Once you have this in place then netlink with tc skip_sw/hw. This is
> > what i meant by "seamless"
> >
> > > Plus, if we're talking about offloads, aren't we getting back into
> > > the same controversies we had when merging OvS (not that I was around).
> > > The "standalone stack to the side" problem. Some of the tables in the
> > > pipeline may be for routing, not ACLs. Should they be fed from the
> > > routing stack? How is that integration going to work? The parsing
> > > graph feels a bit like global device configuration, not a piece of
> > > functionality that should sit under sub-sub-system in the corner.
> >
> > The current (maybe i should say initial) thought is the P4 program
> > does not touch the existing kernel infra such as fdb etc.
>
> It's off to the side thing. Ignoring the fact that *all*, networking
> devices already have parsers which would benefit from being accurately
> described.

Jakub,

This is configurability versus programmability. The table driven
approach as input (configurability) might work fine for generic
match-action tables up to the point that tables are expressive enough
to satisfy the requirements. But parsing doesn't fall into the table
driven paradigm: parsers want to be *programmed*. This is why we
removed kParser from this patch set and fell back to eBPF for parsing.
But the problem we quickly hit that eBPF is not offloadable to network
devices, for example when we compile P4 in an eBPF parser we've lost
the declarative representation that parsers in the devices could
consume (they're not CPUs running eBPF).

I think the key here is what we mean by kernel offload. When we do
kernel offload, is it the kernel implementation or the kernel
functionality that's being offloaded? If it's the latter then we have
a lot more flexibility. What we'd need is a safe and secure way to
synchronize with that offload device that precisely supports the
kernel functionality we'd like to offload. This can be done if both
the kernel bits and programmed offload are derived from the same
source (i.e. tag source code with a sha-1). For example, if someone
writes a parser in P4, we can compile that into both eBPF and a P4
backend using independent tool chains and program download. At
runtime, the kernel can safely offload the functionality of the eBPF
parser to the device if it matches the hash to that reported by the
device

Tom

>
> > Of course we can model the kernel datapath using P4 but you wont be
> > using "ip route add..." or "bridge fdb...".
> > In the future, P4 extern could be used to model existing infra and we
> > should be able to use the same tooling. That is a discussion that
> > comes on/off (i think it did in the last meeting).
>
> Maybe, IDK. I thought prevailing wisdom, at least for offloads,
> is to offload the existing networking stack, and fill in the gaps.
> Not build a completely new implementation from scratch, and "integrate
> later". Or at least "fill in the gaps" is how I like to think.
>
> I can't quite fit together in my head how this is okay, but OvS
> was not allowed to add their offload API. And what's supposed to
> be part of TC and what isn't, where you only expect to have one
> filter here, and create a whole new object universe inside TC.
>
> But that's just my opinions. The way things work we may wake up one
> day and find out that Dave has applied this :)