netdev - RE: On the NACKs on P4TC patches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66566c7c6778d_52e720851@john.notmuch>
Date: Tue, 28 May 2024 16:45:00 -0700
From: John Fastabend <john.fastabend@...il.com>
To: "Singhai, Anjali" <anjali.singhai@...el.com>, 
 John Fastabend <john.fastabend@...il.com>, 
 "Jain, Vipin" <Vipin.Jain@....com>, 
 "Hadi Salim, Jamal" <jhs@...atatu.com>, 
 Jakub Kicinski <kuba@...nel.org>
Cc: Paolo Abeni <pabeni@...hat.com>, 
 Alexei Starovoitov <alexei.starovoitov@...il.com>, 
 Network Development <netdev@...r.kernel.org>, 
 "Chatterjee, Deb" <deb.chatterjee@...el.com>, 
 "Limaye, Namrata" <namrata.limaye@...el.com>, 
 tom Herbert <tom@...anda.io>, 
 Marcelo Ricardo Leitner <mleitner@...hat.com>, 
 "Shirshyad, Mahesh" <Mahesh.Shirshyad@....com>, 
 "Osinski, Tomasz" <tomasz.osinski@...el.com>, 
 Jiri Pirko <jiri@...nulli.us>, 
 Cong Wang <xiyou.wangcong@...il.com>, 
 "David S. Miller" <davem@...emloft.net>, 
 Eric Dumazet <edumazet@...gle.com>, 
 Vlad Buslov <vladbu@...dia.com>, 
 Simon Horman <horms@...nel.org>, 
 Khalid Manaa <khalidm@...dia.com>, 
 Toke Høiland-Jørgensen <toke@...hat.com>, 
 Victor Nogueira <victor@...atatu.com>, 
 "Tammela, Pedro" <pctammela@...atatu.com>, 
 "Daly, Dan" <dan.daly@...el.com>, 
 Andy Fingerhut <andy.fingerhut@...il.com>, 
 "Sommers, Chris" <chris.sommers@...sight.com>, 
 Matty Kadosh <mattyk@...dia.com>, 
 bpf <bpf@...r.kernel.org>, 
 "lwn@....net" <lwn@....net>
Subject: RE: On the NACKs on P4TC patches

Singhai, Anjali wrote:
> >From: John Fastabend <john.fastabend@...il.com> 
> >Sent: Tuesday, May 28, 2024 1:17 PM
> 
> >Jain, Vipin wrote:
> >> [AMD Official Use Only - AMD Internal Distribution Only]
> >> 
> >> My apologies, earlier email used html and was blocked by the list...
> >> My response at the bottom as "VJ>"
> >>
> >> ________________________________________
> 
> >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices?
> 
> >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works.
> 
> >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping
> >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions.
> >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context.
> 
> >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath.
> 
> >.John
> 
> 
> John,                                                                            
> Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware.

Maybe more direct what Linux drivers support this? That would be
a good first place to start IMO. Similarly what AMD hardware
driver supports this. If I have two drivers from two vendors
with P4 support this is great.

For Intel I assume this is idpf?

To be concrete can we start with Linux driver A and P4 program
P. Modprobe driver A and push P4 program P so that it does
something very simple, and drop a CIDR/Port range into a table.
Perhaps this is so obvious in your community the trouble is in
the context of a Linux driver its not immediately obvious to me
and I would suspect its not obvious to many others.

I really think walking through the key steps here would
really help?

 1. $ p4IntelCompiler p4-dos.p4 -o myp4
 2. $ modprobe idpf
 3. $ ping -i eth0 10.0.0.1 // good
 4. $ p4Load p4-dos.p4
 5. -- load cidr into the hardware somehow -- p4rt-ctrl?
 6. $ ping -i eth0 10.0.0.1 // dropped

This is an honest attempt to help fwiw. Questions would be.

For compilation do we need an artifact from Intel it seems
so from docs. But maybe a typo not sure. I'm not overly stuck
on it but worth mentioning if folks try to follow your docs.

For 2 I assume this is just normal every day module load nothing
to see. Does it pop something up in /proc or in firmware or...?
How do I know its P4 ready?

For 4. How does this actually work? Is it a file in a directory
the driver pushes into firmware? How does the firmware know
I've done this? Does the Linux driver already support this?

For 5 (most interesting) how does this work today. How are
you currently talking to the driver/firmware to insert rules
and discover the tables? And does the idpf driver do this
already? Some side channel I guess? This is p4rt-ctrl?

I've seen docs for above in ipdk, but they are a bit hard
to follow if I'm honest.

I assume IPDK is the source folks talk to when we mention there
is hardware somewhere. Also it seems there is an IPDK BPF support
as well which is interesting.

And do you know how the DPDK implementation works? Can we
learn from them is it just on top of Flow API which we
could easily use in devlink or some other *link I suspect.

> The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC.
> These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings.
> One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths.

I think many 1st order and important points have been skipped. How do you
program the device is it a firmware blob, a set of firmware commands,
something that comes to you on device so only vendor sees this? Maybe
I can infer this from some docs and some examples (by the way I ran
through some of your DPU docs and such) but its unclear how these
map onto Linux networking. Jiri started into this earlier and was
cut off because p4tc was not for hardware offload. Now it is apparently.

P4 is a good DSL for this sure and it has a runtime already specified
which is great.

This is not a qdisc/tc its an entire hardware pipeline I don't see
the reason to put it in TC at all.

> We feel P4TC approach is the path to add Linux kernel support.                   

I disagree with your implementation not your goals to support
flexible hardware. 

>                                                                                  
> The s/w path is needed as well for several reasons.                              
> We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath.

None of above requires P4TC. For different architectures you
build optimal backend compilers. You have a Xilenx backend,
an Intel backend, and a Linux CPU based backend. I see no
reason to constrain the software case to map to a pipeline
model for example. Software running on a CPU has very different
characteristics from something running on a TOR, or FPGA.
Trying to push all these into one backend "model" will result
in suboptimal result for every target. At the end of the
day my .02$, P4 is a DSL it needs a target dependent compiler
in front of it. I want to optimize my software pipeline the
compiler should compress tables as much as possible and
search for a O(1) lookup even if getting that key is somewhat
expensive. Conversely a TCAM changes the game. An FPGA is
going to be flexible and make lots of tradeoffs here of which
I'm not an expert. Also by avoiding loading the DSL into the kernel
you leave room for others to build new/better/worse DSLs as they
please.

The P4 community writes control applicatoins on top of the
runtime spec right? p4rt-ctl being the thing I found. This
should abstract the endpoint away to work with hardware or
software or FPGA or anything else.

.John