netdev - Re: [PATCH bpf-next v2 3/7] net: sched: add bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4Bza2X7+begzQVkKoURSx7v+RHTxrAFaoNUSRc-Kyr5DWfQ@mail.gmail.com>
Date:   Mon, 7 Jun 2021 16:23:37 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Kumar Kartikeya Dwivedi <memxor@...il.com>
Cc:     bpf <bpf@...r.kernel.org>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Vlad Buslov <vladbu@...dia.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier

On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
<memxor@...il.com> wrote:
>
> This commit introduces a bpf_link based kernel API for creating tc
> filters and using the cls_bpf classifier. Only a subset of what netlink
> API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and
> embedded actions are unsupported.
>
> The kernel API and the libbpf wrapper added in a subsequent patch are
> more opinionated and mirror the semantics of low level netlink based
> TC-BPF API, i.e. always setting direct action mode, always setting
> protocol to ETH_P_ALL, and only exposing handle and priority as the
> variables the user can control. We add an additional gen_flags parameter
> though to allow for offloading use cases. It would be trivial to extend
> the current API to support specifying other attributes in the future,
> but for now I'm sticking how we want to push usage.
>
> The semantics around bpf_link support are as follows:
>
> A user can create a classifier attached to a filter using the bpf_link
> API, after which changing it and deleting it only happens through the
> bpf_link API. It is not possible to bind the bpf_link to existing
> filter, and any such attempt will fail with EEXIST. Hence EEXIST can be
> returned in two cases, when existing bpf_link owned filter exists, or
> existing netlink owned filter exists.
>
> Removing bpf_link owned filter from netlink returns EPERM, denoting that
> netlink is locked out from filter manipulation when bpf_link is
> involved.
>
> Whenever a filter is detached due to chain removal, or qdisc tear down,
> or net_device shutdown, the bpf_link becomes automatically detached.
>
> In this way, the netlink API and bpf_link creation path are exclusive
> and don't stomp over one another. Filters created using bpf_link API
> cannot be replaced by netlink API, and filters created by netlink API are
> never replaced by bpf_link. Netfilter also cannot detach bpf_link filters.
>
> We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the
> unlocked classifier API.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@...il.com>
> ---
>  include/linux/bpf_types.h |   3 +
>  include/net/pkt_cls.h     |  13 ++
>  include/net/sch_generic.h |   6 +-
>  include/uapi/linux/bpf.h  |  15 +++
>  kernel/bpf/syscall.c      |  10 +-
>  net/sched/cls_api.c       | 139 ++++++++++++++++++++-
>  net/sched/cls_bpf.c       | 250 +++++++++++++++++++++++++++++++++++++-
>  7 files changed, 430 insertions(+), 6 deletions(-)
>

[...]

> @@ -1447,6 +1449,12 @@ union bpf_attr {
>                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
>                                 __u32           iter_info_len;  /* iter_info length */
>                         };
> +                       struct { /* used by BPF_TC */
> +                               __u32 parent;
> +                               __u32 handle;
> +                               __u32 gen_flags;

There is already link_create.flags that's totally up to a specific
type of bpf_link. E.g., cgroup bpf_link doesn't accept any flags,
while xdp bpf_link uses it for passing XDP-specific flags. Is there a
need to have both gen_flags and flags for TC link?

> +                               __u16 priority;

No strong preference, but we typically try to not have unnecessary
padding in UAPI bpf_attr, so I wonder if using __u32 for this would
make sense?

> +                       } tc;
>                 };
>         } link_create;
>
> @@ -5519,6 +5527,13 @@ struct bpf_link_info {
>                 struct {
>                         __u32 ifindex;
>                 } xdp;
> +               struct {
> +                       __u32 ifindex;
> +                       __u32 parent;
> +                       __u32 handle;
> +                       __u32 gen_flags;
> +                       __u16 priority;
> +               } tc;
>         };
>  } __attribute__((aligned(8)));
>

[...]