lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 20 Feb 2024 16:31:50 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: netdev@...r.kernel.org, deb.chatterjee@...el.com,
 anjali.singhai@...el.com, namrata.limaye@...el.com, tom@...anda.io,
 mleitner@...hat.com, Mahesh.Shirshyad@....com, tomasz.osinski@...el.com,
 jiri@...nulli.us, xiyou.wangcong@...il.com, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, vladbu@...dia.com,
 horms@...nel.org, khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com,
 bpf@...r.kernel.org, Victor Nogueira <victor@...atatu.com>,
 Pedro Tammela <pctammela@...atatu.com>
Subject: Re: [PATCH v10 net-next 15/15] p4tc: add P4 classifier

On 2/16/24 10:18 PM, Jamal Hadi Salim wrote:
> On Thu, Jan 25, 2024 at 12:59 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>> On Thu, Jan 25, 2024 at 10:47 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>>> On 1/24/24 3:40 PM, Jamal Hadi Salim wrote:
>>>> On Wed, Jan 24, 2024 at 8:59 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>>>>> On 1/22/24 8:48 PM, Jamal Hadi Salim wrote:
>>> [...]
>>>>>>
>>>>>> It should also be noted that it is feasible to split some of the ingress
>>>>>> datapath into XDP first and more into TC later (as was shown above for
>>>>>> example where the parser runs at XDP level). YMMV.
>>>>>> Regardless of choice of which scheme to use, none of these will affect
>>>>>> UAPI. It will all depend on whether you generate code to load on XDP vs
>>>>>> tc, etc.
>>>>>>
>>>>>> Co-developed-by: Victor Nogueira <victor@...atatu.com>
>>>>>> Signed-off-by: Victor Nogueira <victor@...atatu.com>
>>>>>> Co-developed-by: Pedro Tammela <pctammela@...atatu.com>
>>>>>> Signed-off-by: Pedro Tammela <pctammela@...atatu.com>
>>>>>> Signed-off-by: Jamal Hadi Salim <jhs@...atatu.com>
>>>>>
>>>>> My objections from last iterations still stand, and I also added a nak,
>>>>> so please do not just drop it with new revisions.. from the v10 as you
>>>>> wrote you added further code but despite the various community feedback
>>>>> the design still stands as before, therefore:
>>>>>
>>>>> Nacked-by: Daniel Borkmann <daniel@...earbox.net>
>>>>
>>>> We didnt make code changes - but did you read the cover letter and the
>>>> extended commentary in this patch's commit log? We should have
>>>> mentioned it in the changes log. It did respond to your comments.
>>>> There's text that says "the filter manages the lifetime of the
>>>> pipeline" - which in the future could include not only tc but XDP but
>>>> also the hardware path (in the form of a file that gets loaded). I am
>>>> not sure if that message is clear. Your angle being this is layer
>>>> violation. In the last discussion i asked you for suggestions and we
>>>> went the tcx route, which didnt make sense, and  then you didnt
>>>> respond.
>>> [...]
>>>
>>>>> Also as mentioned earlier I don't think tc should hold references on
>>>>> XDP programs in here. It doesn't make any sense aside from the fact
>>>>> that the cls_p4 is also not doing anything with it. This is something
>>>>> that a user space control plane should be doing i.e. managing a XDP
>>>>> link on the target device.
>>>>
>>>> This is the same argument about layer violation that you made earlier.
>>>> The filter manages the p4 pipeline - i.e it's not just about the ebpf
>>>> blob(s) but for example in the future (discussions are still ongoing
>>>> with vendors who have P4 NICs) a filter could be loaded to also
>>>> specify the location of the hardware blob.
>>>
>>> Ah, so there is a plan to eventually add HW offload support for cls_p4?
>>> Or is this only specifiying a location of a blob through some opaque
>>> cookie value from user space?
>>
>> Current thought process is it will be something along these lines (the
>> commit provides more details):
>>
>> tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3 \
>>     prog type hw filename "mypnameprog.o" ... \
>>     prog type xdp obj $PARSER.o section parser/xdp pinned_link
>> /sys/fs/bpf/mylink \
>>     action bpf obj $PROGNAME.o section prog/tc-ingress
>>
>> These discussions are still ongoing - but that is the current
>> consensus. Note: we are not pushing any code for that, but hope it
>> paints the bigger picture....
>> The idea is the cls p4 owns the lifetime of the pipeline. Installing
>> the filter instantiates the p4 pipeline "simple_l3" and triggers a lot
>> of the refcounts to make sure the pipeline and its components stays
>> alive.
>> There could be multiple such filters - when someone deletes the last
>> filter, then it is safe to delete the pipeline.
>> Essentially the filter manages the lifetime of the pipeline.
>>
>>>> I would be happy with a suggestion that gets us moving forward with
>>>> that context in mind.
>>>
>>> My question on the above is mainly what does it bring you to hold a
>>> reference on the XDP program? There is no guarantee that something else
>>> will get loaded onto XDP, and then eventually the cls_p4 is the only
>>> entity holding the reference but w/o 'purpose'. We do have BPF links
>>> and the user space component orchestrating all this needs to create
>>> and pin the BPF link in BPF fs, for example. An artificial reference
>>> on XDP prog feels similar as if you'd hold a reference on an inode
>>> out of tc.. Again, that should be delegated to the control plane you
>>> have running interacting with the compiler which then manages and
>>> loads its artifacts. What if you would also need to set up some
>>> netfilter rules for the SW pipeline, would you then embed this too?
>>
>> Sorry, a slight tangent first:
>> P4 is self-contained, there are a handful of objects that are defined
>> by the spec (externs, actions, tables, etc) and we model them in the
>> patchset, so that part is self-contained. For the extra richness such
>> as the netfilter example you quoted - based on my many years of
>> experience deploying SDN - using daemons(sorry if i am reading too
>> much in what I think you are implying) for control is not the best
>> option i.e you need all kinds of coordination - for example where do
>> you store state, what happens when the daemon dies, how do you
>> graceful restarts etc. Based on that, if i can put things in the
>> kernel (which is essentially a "perpetual daemon", unless the kernel
>> crashes) it's a lot simpler to manage as a source of truth especially
>> when there is not that much info. There is a limit when there are
>> multiple pieces (to use your netfilter example) because you need
>> another layer to coordinate things.

'source of truth' for the various attach points or BPF links, yes, but in
this case here it is not, since the source of truth on what is attached
is not in cls_p4 but rather on the XDP link. How do you handle the case
when cls_p4 says something different to what is /actually/ attached? Why
is it not enough to establish some convention in user space, to pin the
link and retrieve/update from there when needed? Like everyone else does.
... even if you consider iproute2 your "control plane" (which I have the
feeling you do)?

>> Re: the XDP part - our key reason is mostly managerial, in that the
>> filter is the lifetime manager of the pipeline; and that if i dump

This is imho the problematic part which feels like square peg in round
hole, trying to fit this whole lifetime manager of the pipeline into
the cls_p4 filter. We agree to disagree here. Instead of reusing
individual building blocks from user space, this tries to cramp control
plane parts into the kernel for which its not a great fit with what is
build here as-is.

>> that filter i can see all the details in regards to the pipeline(tc,
>> XDP and in future hw, etc) in one spot. You are right, the link
>> pinning is our protection from someone replacing the XDP prog (this
>> was a tip from Toke in the early days) and the comparison of tc
>> holding inode is apropos.
>> There's some history: in the early days we were also using metadata
>> which comes from the XDP program at the tc layer if more processing
>> was to be done (and there was extra metadata which told us which XDP
>> prog produced it which we would vet before trusting the metadata).
>> Given all the above, we should still be able to hold this info without
>> necessarily holding the extra refcount and be able to see this detail.
>> So we can remove the refcounting.
> 
> Daniel?

The refcount should definitely be removed, but then again, see the point
above in that it is inconsistent information. Why can't this be done in
user space with some convention in your user space control plane - if you
take iproute2, then why it cannot pin the link in a bpf fs instance and
retrieve it from there?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ