netdev - Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzai3maV8E9eWi1fc8fgaeC7qFg7_-N_WdLH4ukv302bhg@mail.gmail.com>
Date:   Wed, 21 Apr 2021 20:43:31 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Kumar Kartikeya Dwivedi <memxor@...il.com>,
        bpf <bpf@...r.kernel.org>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API

On Wed, Apr 21, 2021 at 3:59 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..b4ed6a41ea70 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -16,6 +16,8 @@
> >   #include <stdbool.h>
> >   #include <sys/types.h>  // for size_t
> >   #include <linux/bpf.h>
> > +#include <linux/pkt_sched.h>
> > +#include <linux/tc_act/tc_bpf.h>
> >
> >   #include "libbpf_common.h"
> >
> > @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >
> > +/* Convenience macros for the clsact attach hooks */
> > +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> > +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>
> I would abstract those away into an enum, plus avoid having to pull in
> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
>
> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
> concrete tc bits (TC_H_MAKE()) can be translated internally.
>
> > +struct bpf_tc_opts {
> > +     size_t sz;
>
> Is this set anywhere?
>
> > +     __u32 handle;
> > +     __u32 class_id;
>
> I'd remove class_id from here as well given in direct-action a BPF prog can
> set it if needed.
>
> > +     __u16 priority;
> > +     bool replace;
> > +     size_t :0;
>
> What's the rationale for this padding?
>
> > +};
> > +
> > +#define bpf_tc_opts__last_field replace
> > +
> > +/* Acts as a handle for an attached filter */
> > +struct bpf_tc_attach_id {
>
> nit: maybe bpf_tc_ctx

ok, so wait. It seems like apart from INGRESS|EGRESS enum and ifindex,
everything else is optional and/or has some sane defaults, right? So
this bpf_tc_attach_id or bpf_tc_ctx seems a bit artificial construct
and it will cause problems for extending this.

So if my understanding is correct, I'd get rid of it completely. As I
said previously, opts allow returning parameters back, so if user
didn't specify handle and priority and kernel picks values on user's
behalf, we can return them in the same opts fields.

For detach, again, ifindex and INGRESS|EGRESS is sufficient, but if
user want to provide more detailed parameters, we should do that
through extensible opts. That way we can keep growing this easily,
plus simple cases will remain simple.

Similarly bpf_tc_info below, there is no need to have struct
bpf_tc_attach_id id; field, just have handle and priority right there.
And bpf_tc_info should use OPTS framework for extensibility (though
opts name doesn't fit it very well, but it is still nice for
extensibility and for doing optional input/output params).

Does this make sense? Am I missing something crucial here?

The general rule with any new structs added to libbpf APIs is to
either be 100% (ok, 99.99%) sure that they will never be changed, or
do forward/backward compatible OPTS. Any other thing is pain and calls
for symbol versioning, which we are trying really hard to avoid.

>
> > +     __u32 handle;
> > +     __u16 priority;
> > +};
> > +
> > +struct bpf_tc_info {
> > +     struct bpf_tc_attach_id id;
> > +     __u16 protocol;
> > +     __u32 chain_index;
> > +     __u32 prog_id;
> > +     __u8 tag[BPF_TAG_SIZE];
> > +     __u32 class_id;
> > +     __u32 bpf_flags;
> > +     __u32 bpf_flags_gen;
>
> Given we do not yet have any setters e.g. for offload, etc, the one thing
> I'd see useful and crucial initially is prog_id.
>
> The protocol, chain_index, and I would also include tag should be dropped.
> Similarly class_id given my earlier statement, and flags I would extend once
> this lib API would support offloading progs.
>
> > +};
> > +
> > +/* id is out parameter that will be written to, it must not be NULL */
> > +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
> > +                          const struct bpf_tc_opts *opts,
> > +                          struct bpf_tc_attach_id *id);
> > +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> > +                          const struct bpf_tc_attach_id *id);
> > +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
> > +                            const struct bpf_tc_attach_id *id,
> > +                            struct bpf_tc_info *info);
>
> As per above, for parent_id I'd replace with dir enum.
>
> > +
> >   #ifdef __cplusplus
> >   } /* extern "C" */
> >   #endif