lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=XPJ96s3Y=ivrjH-crGb6hRu4hi90WB-O_SkxvLZNYpQ@mail.gmail.com>
Date: Thu, 25 Jan 2024 12:59:04 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: netdev@...r.kernel.org, deb.chatterjee@...el.com, anjali.singhai@...el.com, 
	namrata.limaye@...el.com, tom@...anda.io, mleitner@...hat.com, 
	Mahesh.Shirshyad@....com, tomasz.osinski@...el.com, jiri@...nulli.us, 
	xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, vladbu@...dia.com, horms@...nel.org, 
	khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com, bpf@...r.kernel.org
Subject: Re: [PATCH v10 net-next 15/15] p4tc: add P4 classifier

On Thu, Jan 25, 2024 at 10:47 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 1/24/24 3:40 PM, Jamal Hadi Salim wrote:
> > On Wed, Jan 24, 2024 at 8:59 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> >> On 1/22/24 8:48 PM, Jamal Hadi Salim wrote:
> [...]
> >>>
> >>> It should also be noted that it is feasible to split some of the ingress
> >>> datapath into XDP first and more into TC later (as was shown above for
> >>> example where the parser runs at XDP level). YMMV.
> >>> Regardless of choice of which scheme to use, none of these will affect
> >>> UAPI. It will all depend on whether you generate code to load on XDP vs
> >>> tc, etc.
> >>>
> >>> Co-developed-by: Victor Nogueira <victor@...atatu.com>
> >>> Signed-off-by: Victor Nogueira <victor@...atatu.com>
> >>> Co-developed-by: Pedro Tammela <pctammela@...atatu.com>
> >>> Signed-off-by: Pedro Tammela <pctammela@...atatu.com>
> >>> Signed-off-by: Jamal Hadi Salim <jhs@...atatu.com>
> >>
> >> My objections from last iterations still stand, and I also added a nak,
> >> so please do not just drop it with new revisions.. from the v10 as you
> >> wrote you added further code but despite the various community feedback
> >> the design still stands as before, therefore:
> >>
> >> Nacked-by: Daniel Borkmann <daniel@...earbox.net>
> >
> > We didnt make code changes - but did you read the cover letter and the
> > extended commentary in this patch's commit log? We should have
> > mentioned it in the changes log. It did respond to your comments.
> > There's text that says "the filter manages the lifetime of the
> > pipeline" - which in the future could include not only tc but XDP but
> > also the hardware path (in the form of a file that gets loaded). I am
> > not sure if that message is clear. Your angle being this is layer
> > violation. In the last discussion i asked you for suggestions and we
> > went the tcx route, which didnt make sense, and  then you didnt
> > respond.
> [...]
>
> >> Also as mentioned earlier I don't think tc should hold references on
> >> XDP programs in here. It doesn't make any sense aside from the fact
> >> that the cls_p4 is also not doing anything with it. This is something
> >> that a user space control plane should be doing i.e. managing a XDP
> >> link on the target device.
> >
> > This is the same argument about layer violation that you made earlier.
> > The filter manages the p4 pipeline - i.e it's not just about the ebpf
> > blob(s) but for example in the future (discussions are still ongoing
> > with vendors who have P4 NICs) a filter could be loaded to also
> > specify the location of the hardware blob.
>
> Ah, so there is a plan to eventually add HW offload support for cls_p4?
> Or is this only specifiying a location of a blob through some opaque
> cookie value from user space?

Current thought process is it will be something along these lines (the
commit provides more details):

tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3 \
   prog type hw filename "mypnameprog.o" ... \
   prog type xdp obj $PARSER.o section parser/xdp pinned_link
/sys/fs/bpf/mylink \
   action bpf obj $PROGNAME.o section prog/tc-ingress

These discussions are still ongoing - but that is the current
consensus. Note: we are not pushing any code for that, but hope it
paints the bigger picture....
The idea is the cls p4 owns the lifetime of the pipeline. Installing
the filter instantiates the p4 pipeline "simple_l3" and triggers a lot
of the refcounts to make sure the pipeline and its components stays
alive.
There could be multiple such filters - when someone deletes the last
filter, then it is safe to delete the pipeline.
Essentially the filter manages the lifetime of the pipeline.

> > I would be happy with a suggestion that gets us moving forward with
> > that context in mind.
>
> My question on the above is mainly what does it bring you to hold a
> reference on the XDP program? There is no guarantee that something else
> will get loaded onto XDP, and then eventually the cls_p4 is the only
> entity holding the reference but w/o 'purpose'. We do have BPF links
> and the user space component orchestrating all this needs to create
> and pin the BPF link in BPF fs, for example. An artificial reference
> on XDP prog feels similar as if you'd hold a reference on an inode
> out of tc.. Again, that should be delegated to the control plane you
> have running interacting with the compiler which then manages and
> loads its artifacts. What if you would also need to set up some
> netfilter rules for the SW pipeline, would you then embed this too?

Sorry, a slight tangent first:
P4 is self-contained, there are a handful of objects that are defined
by the spec (externs, actions, tables, etc) and we model them in the
patchset, so that part is self-contained. For the extra richness such
as the netfilter example you quoted - based on my many years of
experience deploying SDN - using daemons(sorry if i am reading too
much in what I think you are implying) for control is not the best
option i.e you need all kinds of coordination - for example where do
you store state, what happens when the daemon dies, how do you
graceful restarts etc. Based on that, if i can put things in the
kernel (which is essentially a "perpetual daemon", unless the kernel
crashes) it's a lot simpler to manage as a source of truth especially
when there is not that much info. There is a limit when there are
multiple pieces (to use your netfilter example) because you need
another layer to coordinate things.

Re: the XDP part - our key reason is mostly managerial, in that the
filter is the lifetime manager of the pipeline; and that if i dump
that filter i can see all the details in regards to the pipeline(tc,
XDP and in future hw, etc) in one spot. You are right, the link
pinning is our protection from someone replacing the XDP prog (this
was a tip from Toke in the early days) and the comparison of tc
holding inode is apropos.
There's some history: in the early days we were also using metadata
which comes from the XDP program at the tc layer if more processing
was to be done (and there was extra metadata which told us which XDP
prog produced it which we would vet before trusting the metadata).
Given all the above, we should still be able to hold this info without
necessarily holding the extra refcount and be able to see this detail.
So we can remove the refcounting.

cheers,
jamal

> Thanks,
> Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ