netdev - Re: [PATCH v10 net-next 15/15] p4tc: add P4 classifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6841ee07-40c6-9a67-a1a7-c04cbff84757@iogearbox.net>
Date: Wed, 24 Jan 2024 14:59:14 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Jamal Hadi Salim <jhs@...atatu.com>, netdev@...r.kernel.org
Cc: deb.chatterjee@...el.com, anjali.singhai@...el.com,
 namrata.limaye@...el.com, tom@...anda.io, mleitner@...hat.com,
 Mahesh.Shirshyad@....com, tomasz.osinski@...el.com, jiri@...nulli.us,
 xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com, vladbu@...dia.com, horms@...nel.org,
 khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com, bpf@...r.kernel.org
Subject: Re: [PATCH v10 net-next 15/15] p4tc: add P4 classifier

On 1/22/24 8:48 PM, Jamal Hadi Salim wrote:
> Introduce P4 tc classifier. The main task of this classifier is to manage
> the lifetime of pipeline instances across one or more netdev ports.
> Note a pipeline may be instantiated multiple times across one or more tc chains
> and different priorities.
> 
> Note that part or whole of the P4 pipeline could reside in tc, XDP or even
> hardware depending on how the P4 program was compiled.
> To use the P4 classifier you must specify a pipeline name that will be
> associated to the filter instance, a s/w parser (eBPF) and datapath P4
> control block program (eBPF) program. Although this patchset does not deal
> with offloads, it is also possible to load the h/w part using this filter.
> We will illustrate a few examples further below to clarify. Please treat
> the illustrated split as an example - there are probably more pragmatic
> approaches to splitting the pipeline; however, regardless of where the different
> pieces of the pipeline are placed (tc, XDP, HW) and what each layer will
> implement (what part of the pipeline) - these examples are merely showing
> what is possible.
> 
> The pipeline is assumed to have already been created via a template.
> 
> For example, if we were to add a filter to ingress of a group of netdevs
> (tc block 22) and associate it to P4 pipeline simple_l3 we could issue the
> following command:
> 
> tc filter add block 22 parent ffff: protocol all prio 6 p4 pname simple_l3 \
>      action bpf obj $PARSER.o ... \
>      action bpf obj $PROGNAME.o section prog/tc-ingress
> 
> The above uses the classical tc action mechanism in which the first action
> runs the P4 parser and if that goes well then the P4 control block is
> executed. Note, although not shown above, one could also append the command
> line with other traditional tc actions.
> 
> In these patches, we also support two types of loadings of the pipeline
> programs and differentiate between what gets loaded at say tc vs xdp by using
> syntax which specifies location as either "prog type tc obj" or
> "prog type xdp obj". There is an ongoing discussion in the P4TC community
> biweekly meetings which is likely going to have us add another location
> definition "prog type hw" which will specify the hardware object file name
> and other related attributes.
> 
> An example using tc:
> 
> tc filter add block 22 parent ffff: protocol all prio 6 p4 pname simple_l3 \
>      prog type tc obj $PARSER.o ... \
>      action bpf obj $PROGNAME.o section prog/tc-ingress
> 
> For XDP, to illustrate an example:
> 
> tc filter add dev $P0 ingress protocol all prio 1 p4 pname simple_l3 \
>      prog type xdp obj $PARSER.o section parser/xdp \
>      pinned_link /sys/fs/bpf/mylink \
>      action bpf obj $PROGNAME.o section prog/tc-ingress
> 
> In this case, the parser will be executed in the XDP layer and the rest of
> P4 control block as a tc action.
> 
> For illustration sake, the hw one looks as follows (please note there's
> still a lot of discussions going on in the meetings - the example is here
> merely to illustrate the tc filter functionality):
> 
> tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3 \
>     prog type hw filename "mypnameprog.o" ... \
>     prog type xdp obj $PARSER.o section parser/xdp pinned_link /sys/fs/bpf/mylink \
>     action bpf obj $PROGNAME.o section prog/tc-ingress
> 
> The theory of operations is as follows:
> 
> ================================1. PARSING================================
> 
> The packet first encounters the parser.
> The parser is implemented in ebpf residing either at the TC or XDP
> level. The parsed header values are stored in a shared eBPF map.
> When the parser runs at XDP level, we load it into XDP using tc filter
> command and pin it to a file.
> 
> =============================2. ACTIONS=============================
> 
> In the above example, the P4 program (minus the parser) is encoded in an
> action($PROGNAME.o). It should be noted that classical tc actions
> continue to work:
> IOW, someone could decide to add a mirred action to mirror all packets
> after or before the ebpf action.
> 
> tc filter add dev $P0 parent ffff: protocol all prio 6 p4 pname simple_l3 \
>      prog type tc obj $PARSER.o section parser/tc-ingress \
>      action bpf obj $PROGNAME.o section prog/tc-ingress \
>      action mirred egress mirror index 1 dev $P1 \
>      action bpf obj $ANOTHERPROG.o section mysect/section-1
> 
> It should also be noted that it is feasible to split some of the ingress
> datapath into XDP first and more into TC later (as was shown above for
> example where the parser runs at XDP level). YMMV.
> Regardless of choice of which scheme to use, none of these will affect
> UAPI. It will all depend on whether you generate code to load on XDP vs
> tc, etc.
> 
> Co-developed-by: Victor Nogueira <victor@...atatu.com>
> Signed-off-by: Victor Nogueira <victor@...atatu.com>
> Co-developed-by: Pedro Tammela <pctammela@...atatu.com>
> Signed-off-by: Pedro Tammela <pctammela@...atatu.com>
> Signed-off-by: Jamal Hadi Salim <jhs@...atatu.com>

My objections from last iterations still stand, and I also added a nak,
so please do not just drop it with new revisions.. from the v10 as you
wrote you added further code but despite the various community feedback
the design still stands as before, therefore:

Nacked-by: Daniel Borkmann <daniel@...earbox.net>

[...]
> +static int cls_p4_prog_from_efd(struct nlattr **tb,
> +				struct p4tc_bpf_prog *prog, u32 flags,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct bpf_prog *fp;
> +	u32 prog_type;
> +	char *name;
> +	u32 bpf_fd;
> +
> +	bpf_fd = nla_get_u32(tb[TCA_P4_PROG_FD]);
> +	prog_type = nla_get_u32(tb[TCA_P4_PROG_TYPE]);
> +
> +	if (prog_type != BPF_PROG_TYPE_XDP &&
> +	    prog_type != BPF_PROG_TYPE_SCHED_ACT) {

Also as mentioned earlier I don't think tc should hold references on
XDP programs in here. It doesn't make any sense aside from the fact
that the cls_p4 is also not doing anything with it. This is something
that a user space control plane should be doing i.e. managing a XDP
link on the target device.

> +		NL_SET_ERR_MSG(extack,
> +			       "BPF prog type must be BPF_PROG_TYPE_SCHED_ACT or BPF_PROG_TYPE_XDP");
> +		return -EINVAL;
> +	}
> +
> +	fp = bpf_prog_get_type_dev(bpf_fd, prog_type, false);
> +	if (IS_ERR(fp))
> +		return PTR_ERR(fp);
> +
> +	name = nla_memdup(tb[TCA_P4_PROG_NAME], GFP_KERNEL);
> +	if (!name) {
> +		bpf_prog_put(fp);
> +		return -ENOMEM;
> +	}
> +
> +	prog->p4_prog_name = name;
> +	prog->p4_prog = fp;
> +
> +	return 0;
> +}
> +