[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87357qvdso.fsf@toke.dk>
Date: Tue, 31 Jan 2023 18:01:27 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: Jamal Hadi Salim <jhs@...atatu.com>,
John Fastabend <john.fastabend@...il.com>,
Jamal Hadi Salim <hadi@...atatu.com>,
Willem de Bruijn <willemb@...gle.com>,
Stanislav Fomichev <sdf@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
kernel@...atatu.com, deb.chatterjee@...el.com,
anjali.singhai@...el.com, namrata.limaye@...el.com,
khalidm@...dia.com, tom@...anda.io, pratyush@...anda.io,
xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, vladbu@...dia.com, simon.horman@...igine.com,
stefanc@...vell.com, seong.kim@....com, mattyk@...dia.com,
dan.daly@...el.com, john.andy.fingerhut@...el.com
Subject: Re: [PATCH net-next RFC 00/20] Introducing P4TC
Jiri Pirko <jiri@...nulli.us> writes:
> Tue, Jan 31, 2023 at 01:17:14PM CET, toke@...hat.com wrote:
>>Jamal Hadi Salim <jhs@...atatu.com> writes:
>>
>>> Toke, i dont think i have managed to get across that there is an
>>> "autonomous" control built into the kernel. It is not just things that
>>> come across netlink. It's about the whole infra.
>>
>>I'm not disputing the need for the TC infra to configure the pipelines
>>and their relationship in the hardware. I'm saying that your
>>implementation *of the SW path* is the wrong approach and it would be
>>better done by using BPF (not talking about the existing TC-BPF,
>>either).
>>
>>It's a bit hard to know your thinking for sure here, since your patch
>>series doesn't include any of the offload control bits. But from the
>>slides and your hints in this series, AFAICT, the flow goes something
>>like:
>>
>>hw_pipeline_id = devlink_program_hardware(dev, p4_compiled_blob);
>>sw_pipeline_id = `tc p4template create ...` (etc, this is generated by P4C)
>>
>>tc_act = tc_act_create(hw_pipeline_id, sw_pipeline_id)
>>
>>which will turn into something like:
>>
>>struct p4_cls_offload ofl = {
>> .classid = classid,
>> .pipeline_id = hw_pipeline_id
>>};
>>
>>if (check_sw_and_hw_equivalence(hw_pipeline_id, sw_pipeline_id)) /* some magic check here */
>> return -EINVAL;
>>
>>netdev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_P4, &ofl);
>>
>>
>>I.e, all that's being passed to the hardware is the ID of the
>>pre-programmed pipeline, because that programming is going to be
>>out-of-band via devlink anyway.
>>
>>In which case, you could just as well replace the above:
>>
>>sw_pipeline_id = `tc p4template create ...` (etc, this is generated by P4C)
>>
>>with
>>
>>sw_pipeline_id = bpf_prog_load(BPF_PROG_TYPE_P4TC, "my_obj_file.o"); /* my_obj_file is created by P4c */
>>
>>and achieve exactly the same.
>>
>>Having all the P4 data types and concepts exist inside the kernel
>>*might* make sense if the kernel could then translate those into the
>>hardware representations and manage their lifecycle in a uniform way.
>>But as far as I can tell from the slides and what you've been saying in
>>this thread that's not going to be possible anyway, so why do you need
>>anything more granular than the pipeline ID?
>
> Toke, I understand what what you describe above is applicable for the P4
> program instantiation (pipeline definition).
>
> What is the suggestion for the actual "rule insertions" ? Would it make
> sense to use TC iface (Jamal's or similar) to insert rules to both BPF SW
> path and offloaded HW path?
Hmm, so by "rule insertions" here you're referring to populating what P4
calls 'tables', right?
I could see a couple of ways this could be bridged between the BPF side
and the HW side:
- Create a new BPF map type that is backed by the TC-internal data
structure, so updates from userspace go via the TC interface, but BPF
programs access the contents via the bpf_map_*() helpers (or we could
allow updating via the bpf() syscall as well)
- Expose the TC data structures to BPF via their own set of kfuncs,
similar to what we did for conntrack
- Scrap the TC interface entirely and make this an offload-enabled BPF
map type (using the BPF ndo and bpf_map_dev_ops operations to update
it). Userspace would then populate it via the bpf() syscall like any
other map.
I suspect the map interface is the most straight-forward to use from the
BPF side, but informing this by what existing implementations do
(thinking of the P4->XDP compiler in particular) might be a good idea?
-Toke
Powered by blists - more mailing lists