[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70213fce-858e-5384-1614-919c4eced8ba@iogearbox.net>
Date: Thu, 6 May 2021 23:57:10 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Kumar Kartikeya Dwivedi <memxor@...il.com>
Cc: bpf@...r.kernel.org,
Toke Høiland-Jørgensen <toke@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Shaun Crampton <shaun@...era.io>, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v6 2/3] libbpf: add low level TC-BPF API
On 5/6/21 4:37 AM, Kumar Kartikeya Dwivedi wrote:
> On Thu, May 06, 2021 at 03:12:01AM IST, Daniel Borkmann wrote:
>> On 5/4/21 2:50 AM, Kumar Kartikeya Dwivedi wrote:
>>> This adds functions that wrap the netlink API used for adding,
>>> manipulating, and removing traffic control filters.
>>>
>>> An API summary:
>>
>> Looks better, few minor comments below:
>>
>>> A bpf_tc_hook represents a location where a TC-BPF filter can be
>>> attached. This means that creating a hook leads to creation of the
>>> backing qdisc, while destruction either removes all filters attached to
>>> a hook, or destroys qdisc if requested explicitly (as discussed below).
>>>
>>> The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
>>> query, and detach tc filters.
>>>
>>> All functions return 0 on success, and a negative error code on failure.
>>>
>>> bpf_tc_hook_create - Create a hook
>>> Parameters:
>>> @hook - Cannot be NULL, ifindex > 0, attach_point must be set to
>>> proper enum constant. Note that parent must be unset when
>>> attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
>>> that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
>>> valid value for attach_point.
>>>
>>> Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
>>>
>>> hook's flags member can be BPF_TC_F_REPLACE, which
>>> creates qdisc in non-exclusive mode (i.e. an existing
>>> qdisc will be replaced instead of this function failing
>>> with -EEXIST).
>>
>> Why supporting BPF_TC_F_REPLACE here? It's not changing any qdisc parameters
>> given clsact doesn't have any, no? Iow, what effect are you expecting on this
>> with BPF_TC_F_REPLACE & why supporting it? I'd probably just require flags to
>> be 0 here, and if hook exists return sth like -EEXIST.
>
> Ok, will change.
>
>>> bpf_tc_hook_destroy - Destroy the hook
>>> Parameters:
>>> @hook - Cannot be NULL. The behaviour depends on value of
>>> attach_point.
>>>
>>> If BPF_TC_INGRESS, all filters attached to the ingress
>>> hook will be detached.
>>> If BPF_TC_EGRESS, all filters attached to the egress hook
>>> will be detached.
>>> If BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
>>> deleted, also detaching all filters.
>>>
>>> As before, parent must be unset for these attach_points,
>>> and set for BPF_TC_CUSTOM. flags must also be unset.
>>>
>>> It is advised that if the qdisc is operated on by many programs,
>>> then the program at least check that there are no other existing
>>> filters before deleting the clsact qdisc. An example is shown
>>> below:
>>>
>>> DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
>>> .attach_point = BPF_TC_INGRESS);
>>> /* set opts as NULL, as we're not really interested in
>>> * getting any info for a particular filter, but just
>>> * detecting its presence.
>>> */
>>> r = bpf_tc_query(&hook, NULL);
>>> if (r == -ENOENT) {
>>> /* no filters */
>>> hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
>>> return bpf_tc_hook_destroy(&hook);
>>> } else {
>>> /* failed or r == 0, the latter means filters do exist */
>>> return r;
>>> }
>>>
>>> Note that there is a small race between checking for no
>>> filters and deleting the qdisc. This is currently unavoidable.
>>>
>>> Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
>>>
>>> bpf_tc_attach - Attach a filter to a hook
>>> Parameters:
>>> @hook - Cannot be NULL. Represents the hook the filter will be
>>> attached to. Requirements for ifindex and attach_point are
>>> same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
>>> is also supported. In that case, parent must be set to the
>>> handle where the filter will be attached (using TC_H_MAKE).
>>> flags member must be unset.
>>>
>>> E.g. To set parent to 1:16 like in tc command line,
>>> the equivalent would be TC_H_MAKE(1 << 16, 16)
>>
>> Small nit: I wonder whether from libbpf side we should just support a more
>> user friendly TC_H_MAKE, so you'd have: BPF_TC_CUSTOM + BPF_TC_PARENT(1, 16).
>
> Something like this was there in v1. I'll add this macro again (I guess the most surprising part of
> TC_H_MAKE is that it won't shift the major number).
Agree, weird one. :)
[...]
>>> bpf_tc_detach
>>> Parameters:
>>> @hook: Cannot be NULL. Represents the hook the filter will be
>>> detached from. Requirements are same as described above
>>> in bpf_tc_attach.
>>>
>>> @opts: Cannot be NULL.
>>>
>>> The following opts must be set:
>>> handle
>>> priority
>>> The following opts must be unset:
>>> prog_fd
>>> prog_id
>>> flags
>>>
>>> bpf_tc_query
>>> Parameters:
>>> @hook: Cannot be NULL. Represents the hook where the filter
>>> lookup will be performed. Requires are same as described
>>> above in bpf_tc_attach.
>>>
>>> @opts: Can be NULL.
>>
>> Shouldn't it be: Cannot be NULL?
>
> This allows you to check the existence of a filter. If set to NULL we skip writing anything to opts,
You mean in this case s/filter/hook/, right?
> but we still return -ENOENT or 0 depending on whether atleast one filter exists (based on the
> default attributes that we choose). This is used in multiple places in the test, to determine
> whether no filters exists.
In other words, it's same as bpf_tc_hook_create() which would return -EEXIST just that
we do /not/ create the hook if it does not exist, right?
>>> The following opts are optional:
>>> handle
>>> priority
>>> prog_fd
>>> prog_id
>>
>> What is the use case to set prog_fd here?
>
> It allows you to search with the prog_id of the program represented by fd. It's just a convenience
> thing, we end up doing a call to get the prog_id for you, and since the parameter is already there,
> it seemed ok to support this.
I would drop that part and have prog_fd forced to 0, given libbpf already has other means to
retrieve it from fd, and if non-convenient, then lets add a simple/generic libbpf API.
>>> The following opts must be unset:
>>> flags
>>>
>>> However, only one of prog_fd and prog_id must be
>>> set. Setting both leads to an error. Setting none is
>>> allowed.
>>>
>>> The following fields will be filled by bpf_tc_query on a
>>> successful lookup if they are unset:
>>> handle
>>> priority
>>> prog_id
>>>
>>> Based on the specified optional parameters, the matching
>>> data for the first matching filter is filled in and 0 is
>>> returned. When setting prog_fd, the prog_id will be
>>> matched against prog_id of the loaded SCHED_CLS prog
>>> represented by prog_fd.
>>>
>>> To uniquely identify a filter, e.g. to detect its presence,
>>> it is recommended to set both handle and priority fields.
>>
>> What if prog_id is not unique, but part of multiple instances? Do we need
>> to support this case?
>
> We return the first filter that matches on the prog_id. I think it is worthwhile to support this, as
> long as the kernel's sequence of returning filters is stable (which it is), we keep returning the
> same filter's handle/priority, so you can essentially pop filters attached to a hook one by one by
> passing in unset opts and getting its details (or setting one of the parameters and making the
> lookup domain smaller).
>
> In simple words, setting one of the parameters that will be filled leads to only returning an entry
> that matches them. This is similar to what tc filter show's dump allows you to do.
I think this is rather a bit weird/hacky/unintuitive. If we need such API, then lets add a
proper one which returns all handle/priority combinations that match for a given prog_id
for the provided hook, but I don't think this needs to be in the initial set; could be done
as follow-up. (*)
>> Why not just bpf_tc_query() with non-NULL hook and non-NULL opts where
>> handle and priority is required to be set, and rest must be 0?
>
> There is also a usecase for us where we need to query the existing filter on a hook without knowing
> its handle/priority. Shaun also mentioned something similar, where they then go on to check the tag
> they get from the returned prog_id to determine what to do next.
See (*).
>>> Some usage examples (using bpf skeleton infrastructure):
>>>
>>> BPF program (test_tc_bpf.c):
>>>
>>> #include <linux/bpf.h>
>>> #include <bpf/bpf_helpers.h>
>>>
>>> SEC("classifier")
>>> int cls(struct __sk_buff *skb)
>>> {
>>> return 0;
>>> }
>>>
>>> Userspace loader:
>>>
>>> DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
>>> struct test_tc_bpf *skel = NULL;
>>> int fd, r;
>>>
>>> skel = test_tc_bpf__open_and_load();
>>> if (!skel)
>>> return -ENOMEM;
>>>
>>> fd = bpf_program__fd(skel->progs.cls);
>>>
>>> DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
>>> if_nametoindex("lo"), .attach_point =
>>> BPF_TC_INGRESS);
>>> /* Create clsact qdisc */
>>> r = bpf_tc_hook_create(&hook);
>>> if (r < 0)
>>> goto end;
>>>
>>> DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
>>
>> Given we had DECLARE_LIBBPF_OPTS earlier, can't we just set:
>> opts.prog_fd = fd here?
>
> Right, will fix.
>
>>
>>> r = bpf_tc_attach(&hook, &opts);
>>> if (r < 0)
>>> goto end;
>>> /* Print the auto allocated handle and priority */
>>> printf("Handle=%u", opts.handle);
>>> printf("Priority=%u", opts.priority);
>>>
>>> opts.prog_fd = opts.prog_id = 0;
>>> bpf_tc_detach(&hook, &opts);
>>
Thanks,
Daniel
Powered by blists - more mailing lists