netdev - Re: [PATCH net-next v8 00/15] Introducing P4TC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=vbyKD9+t=UQ73AyLZtE2xP9i9RKCVMqeXwEh+j-nyjQ@mail.gmail.com>
Date: Fri, 17 Nov 2023 07:49:43 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: netdev@...r.kernel.org, deb.chatterjee@...el.com, anjali.singhai@...el.com, 
	Vipin.Jain@....com, namrata.limaye@...el.com, tom@...anda.io, 
	mleitner@...hat.com, Mahesh.Shirshyad@....com, tomasz.osinski@...el.com, 
	jiri@...nulli.us, xiyou.wangcong@...il.com, davem@...emloft.net, 
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, vladbu@...dia.com, 
	horms@...nel.org, daniel@...earbox.net, bpf@...r.kernel.org, 
	khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com, dan.daly@...el.com, 
	chris.sommers@...sight.com, john.andy.fingerhut@...el.com
Subject: Re: [PATCH net-next v8 00/15] Introducing P4TC

On Fri, Nov 17, 2023 at 1:27 AM John Fastabend <john.fastabend@...il.com> wrote:
>
> Jamal Hadi Salim wrote:
> > We are seeking community feedback on P4TC patches.
> >
>
> [...]
>
> >
> > What is P4?
> > -----------
>
> I read the cover letter here is my high level takeaway.
>

At least you read the cover letter this time ;->

> P4c-bpf backend exists and I don't see why we wouldn't use that as a starting
> point.

Are you familiar with P4 architectures? That code was for PSA (which
is essentially for switches) we are doing PNA (which is more nic
oriented).
And yes, we used that code as a starting point and made the necessary
changes needed to conform to PNA. We made it actually work better by
using kfuncs.

> At least the cover letter needs to explain why this path is not taken.

I thought we had a reference to that backend - but will add it for the
next update.

> From the cover letter there appears to be bpf pieces and non-bpf pieces, but
> I don't see any reason not to just land it all in BPF. Support exists and if
> its missing some smaller things add them and everyone gets them vs niche P4
> backend.

Ok, i thought you said you read the cover letter. Reasons are well
stated, primarily that we need to make sure all P4 programs work.

>
> Without hardware support for any of this its impossible to understand how 'tc'
> would work as a hardware offload interface for a p4 device so we need hardware
> support to evaluate. For example I'm not even sure how you would take a BPF
> parser into hardware on most network devices that aren't processor based.
>

P4 has nothing to do with parsers in hardware. Where did you get this
requirement from?

> P4 has a P4Runtime I think most folks would prefer a P4 UI vs typing in 'tc'
> commands so arguing for 'tc' UI is nice is not going to be very compelling.
> Best we can say is it works well enough and we use it.


The control plane interface is netlink. This part is not negotiable.
You can write whatever you want on top of it(for example P4runtime
using netlink as its southbound interface). We feel that tc - a well
understood utility - is one we should make publicly available for the
rest of the world to use. For example we have rust code that runs on
top of netlink to do performance testing.

> more commentary below.
>
> >
> > The Programming Protocol-independent Packet Processors (P4) is an open source,
> > domain-specific programming language for specifying data plane behavior.
> >
> > The P4 ecosystem includes an extensive range of deployments, products, projects
> > and services, etc[9][10][11][12].
> >
> > __What is P4TC?__
> >
> > P4TC is a net-namespace aware implementation, meaning multiple P4 programs can
> > run independently in different namespaces alongside their appropriate state. The
> > implementation builds on top of many years of Linux TC experiences.
> > On why P4 - see small treatise here:[4].
> >
> > There have been many discussions and meetings since about 2015 in regards to
> > P4 over TC[2] and we are finally proving the naysayers that we do get stuff
> > done!
> >
> > A lot more of the P4TC motivation is captured at:
> > https://github.com/p4tc-dev/docs/blob/main/why-p4tc.md
> >
> > **In this patch series we focus on s/w datapath only**.
>
> I don't see the value in adding 16676 lines of code for s/w only datapath
> of something we already can do with p4c-ebpf backend.

Please please stop this entitlement politics (which i frankly think
you guys have been getting away with for a few years now).
This code does not touch any core code - you guys constantly push code
that touches core code and it is not unusual we have to pick up the
pieces after but now you are going to call me out for the number of
lines of code? Is it ok for you to write lines of code in the kernel
but not me? Judge the technical work then we can have a meaningful
discussion.

TBH, I am trying very hard to see if i should respond to any more
comments from you. I was very happy with our original scriptable
approach and you came out and banged on the table that you want ebpf.
We spent 10 months of multiple people working on this code to make it
ebpf friendly and now you want more (actually i am not sure what the
hell you want).

> Or one of the other
> backends already there. Namely take P4 programs and run them on CPUs in Linux.
>
> Also I suspect a pipelined datapath is going to be slower than a O(1) lookup
> datapath so I'm guessing its slower than most datapaths we have already.
>
> What do we gain here over existing p4c-ebpf?
>

see above.

> >
> > __P4TC Workflow__
> >
> > These patches enable kernel and user space code change _independence_ for any
> > new P4 program that describes a new datapath. The workflow is as follows:
> >
> >   1) A developer writes a P4 program, "myprog"
> >
> >   2) Compiles it using the P4C compiler[8]. The compiler generates 3 outputs:
> >      a) shell script(s) which form template definitions for the different P4
> >      objects "myprog" utilizes (tables, externs, actions etc).
>
> This is odd to me. I think packing around shell scrips as a program is not
> very usable. Why not just an object file.
>
> >      b) the parser and the rest of the datapath are generated
> >      in eBPF and need to be compiled into binaries.
> >      c) A json introspection file used for the control plane (by iproute2/tc).
>
> Why split up the eBPF and control plane like this? eBPF has a control plane
> just use the existing one?
>

The cover letter clearly states that we are using netlink as the
control api. Does eBPF support netlink?

> >
> >   3) The developer (or operator) executes the shell script(s) to manifest the
> >      functional "myprog" into the kernel.
> >
> >   4) The developer (or operator) instantiates "myprog" via the tc P4 filter
> >      to ingress/egress (depending on P4 arch) of one or more netdevs/ports.
> >
> >      Example1: parser is an action:
> >        "tc filter add block 22 ingress protocol all prio 10 p4 pname myprog \
> >         action bpf obj $PARSER.o section parser/tc-ingress \
> >         action bpf obj $PROGNAME.o section p4prog/tc"
> >
> >      Example2: parser explicitly bound and rest of dpath as an action:
> >        "tc filter add block 22 ingress protocol all prio 10 p4 pname myprog \
> >         prog tc obj $PARSER.o section parser/tc-ingress \
> >         action bpf obj $PROGNAME.o section p4prog/tc"
> >
> >      Example3: parser is at XDP, rest of dpath as an action:
> >        "tc filter add block 22 ingress protocol all prio 10 p4 pname myprog \
> >         prog type xdp obj $PARSER.o section parser/xdp-ingress \
> >       pinned_link /path/to/xdp-prog-link \
> >         action bpf obj $PROGNAME.o section p4prog/tc"
> >
> >      Example4: parser+prog at XDP:
> >        "tc filter add block 22 ingress protocol all prio 10 p4 pname myprog \
> >         prog type xdp obj $PROGNAME.o section p4prog/xdp \
> >       pinned_link /path/to/xdp-prog-link"
> >
> >     see individual patches for more examples tc vs xdp etc. Also see section on
> >     "challenges" (on this cover letter).
> >
> > Once "myprog" P4 program is instantiated one can start updating table entries
> > that are associated with myprog's table named "mytable". Example:
> >
> >   tc p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \
> >     action send_to_port param port eno1
>
> As a UI above is entirely cryptic to most folks I bet.
>

But ebpf is not?

> myprog table is a BPF map? If so then I don't see any need for this just
> interact with it like a BPF map. I suspect its some other object, but
> I don't see any ratoinal for that.

All the P4 objects sit in the TC domain. The datapath program is ebpf.
Control is via netlink.


> >
> > A packet arriving on ingress of any of the ports on block 22 will first be
> > exercised via the (eBPF) parser to find the headers pointing to the ip
> > destination address.
> > The remainder eBPF datapath uses the result dstAddr as a key to do a lookup in
> > myprog's mytable which returns the action params which are then used to execute
> > the action in the eBPF datapath (eventually sending out packets to eno1).
> > On a table miss, mytable's default miss action is executed.
>
> This chunk looks like standard BPF program. Parse pkt, lookup an action,
> do the action.
>

Yes, the ebpf datapath does the parsing, and then interacts with
kfuncs to the tc world before it (the ebpf datapath) executes the
action.
Note: ebpf did not invent any of that (parse, lookup, action). It has
existed in tc for 20 years before ebpf existed.

> > __Description of Patches__
> >
> > P4TC is designed to have no impact on the core code for other users
> > of TC. IOW, you can compile it out but even if it compiled in and you dont use
> > it there should be no impact on your performance.
> >
> > We do make core kernel changes. Patch #1 adds infrastructure for "dynamic"
> > actions that can be created on "the fly" based on the P4 program requirement.
>
> the common pattern in bpf for this is to use a tail call map and populate
> it at runtime and/or just compile your program with the actions. Here
> the actions came from the p4 back up at step 1 so no reason we can't
> just compile them with p4c.
>
> > This patch makes a small incision into act_api which shouldn't affect the
> > performance (or functionality) of the existing actions. Patches 2-4,6-7 are
> > minimalist enablers for P4TC and have no effect the classical tc action.
> > Patch 5 adds infrastructure support for preallocation of dynamic actions.
> >
> > The core P4TC code implements several P4 objects.
>
> [...]
>
> >
> > __Restating Our Requirements__
> >
> > The initial release made in January/2023 had a "scriptable" datapath (think u32
> > classifier and pedit action). In this section we review the scriptable version
> > against the current implementation we are pushing upstream which uses eBPF.
> >
> > Our intention is to target the TC crowd.
> > Essentially developers and ops people deploying TC based infra.
> > More importantly the original intent for P4TC was to enable _ops folks_ more than
> > devs (given code is being generated and doesn't need humans to write it).
>
> I don't follow. humans wrote the p4.
>

But not the ebpf code, that is compiler generated. P4 is a higher
level Domain specific language and ebpf is just one backend (others
s/w variants include DPDK, Rust, C, etc)

> I think the intent should be to enable P4 to run on Linux. Ideally efficiently.
> If the _ops folks are writing P4 great as long as we give them an efficient
> way to run their p4 I don't think they care about what executes it.
>
> >
> > With TC, we get whole "familiar" package of match-action pipeline abstraction++,
> > meaning from the control plane all the way to the tooling infra, i.e
> > iproute2/tc cli, netlink infra(request/resp, event subscribe/multicast-publish,
> > congestion control etc), s/w and h/w symbiosis, the autonomous kernel control,
> > etc.
> > The main advantage is that we have a singular vendor-neutral interface via the
> > kernel using well understood mechanisms based on deployment experience (and
> > at least this part doesnt need retraining).
>
> A seemless p4 experience would be great. That looks like a tooling problem
> at the p4c-backend and p4c-frontend problem. Rather than a bunch of 'tc' glue
> I would aim for,
>
>   $ p4c-* myprog.p4
>   $ p4cRun ./myprog
>
> And maybe some options like,
>
>   $ p4cRun -i eth0 ./myprog

Armchair lawyering and classical ML bikesheding

> Then use the p4runtime to interface with the system. If you don't like the
> runtime then it should be brought up in that working group.
>
> >
> > 1) Supporting expressibility of the universe set of P4 progs
> >
> > It is a must to support 100% of all possible P4 programs. In the past the eBPF
> > verifier had to be worked around and even then there are cases where we couldnt
> > avoid path explosion when branching is involved. Kfunc-ing solves these issues
> > for us. Note, there are still challenges running all potential P4 programs at
> > the XDP level - the solution to that is to have the compiler generate XDP based
> > code only if it possible to map it to that layer.
>
> Examples and we can fix it.

Right. Let me wait for you to fix something 5 years from now. I would
never have used eBPF at all but the kfunc is what changed my mind.

> >
> > 2) Support for P4 HW and SW equivalence.
> >
> > This feature continues to work even in the presence of eBPF as the s/w
> > datapath. There are cases of square-hole-round-peg scenarios but
> > those are implementation issues we can live with.
>
> But no hw support.
>

This patcheset has nothing to do with offload (you read the cover
letter?). All above is saying is that by virtue of using TC we have a
path to a proven offload approach.


> >
> > 3) Operational usability
> >
> > By maintaining the TC control plane (even in presence of eBPF datapath)
> > runtime aspects remain unchanged. So for our target audience of folks
> > who have deployed tc including offloads - the comfort zone is unchanged.
> > There is also the comfort zone of continuing to use the true-and-tried netlink
> > interfacing.
>
> The P4 control plane should be P4Runtime.
>

And be my guest and write it on top of netlink.

cheers,
jamal

> >
> > There is some loss in operational usability because we now have more knobs:
> > the extra compilation, loading and syncing of ebpf binaries, etc.
> > IOW, I can no longer just ship someone a shell script in an email to
> > say go run this and "myprog" will just work.
> >
> > 4) Operational and development Debuggability
> >
> > If something goes wrong, the tc craftsperson is now required to have additional
> > knowledge of eBPF code and process. This applies to both the operational person
> > as well as someone who wrote a driver. We dont believe this is solvable.
> >
> > 5) Opportunity for rapid prototyping of new ideas
>
> [...]
>
> > 6) Supporting per namespace program
> >
> > This requirement is still met (by virtue of keeping P4 control objects within the
> > TC domain).
>
> BPF can also be network namespaced I'm not sure I understand comment.
>
> >
> > __Challenges__
> >
> > 1) Concept of tc block in XDP is _very tedious_ to implement. It would be nice
> >    if we can use concept there as well, since we expect P4 to work with many
> >    ports. It will likely require some core patches to fix this.
> >
> > 2) Right now we are using "packed" construct to enforce alignment in kfunc data
> >    exchange; but we're wondering if there is potential to use BTF to understand
> >    parameters and their offsets and encode this information at the compiler
> >    level.
> >
> > 3) At the moment we are creating a static buffer of 128B to retrieve the action
> >    parameters. If you have a lot of table entries and individual(non-shared)
> >    action instances with actions that require very little (or no) param space
> >    a lot of memory is wasted. There may also be cases where 128B may not be
> >    enough; (likely this is something we can teach the P4C compiler). If we can
> >    have dynamic pointers instead for kfunc fixed length parameterization then
> >    this issue is resolvable.
> >
> > 4) See "Restating Our Requirements" #5.
> >    We would really appreciate ideas/suggestions, etc.
> >
> > __References__
>
> Thanks,
> John