netdev - Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API for multi-progs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBu0u_HuuGCW=vjQp4nsMB4QFtgza7A9VAdbPFzAvAyorg@mail.gmail.com>
Date: Mon, 10 Jul 2023 14:13:56 -0700
From: Stanislav Fomichev <sdf@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>, Alexei Starovoitov <ast@...nel.org>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Nikolay Aleksandrov <razor@...ckwall.org>, John Fastabend <john.fastabend@...il.com>, 
	Jakub Kicinski <kuba@...nel.org>, Daniel Xu <dxu@...uu.xyz>, Joe Stringer <joe@...ium.io>, 
	Toke Høiland-Jørgensen <toke@...nel.org>, 
	"David S. Miller" <davem@...emloft.net>, bpf <bpf@...r.kernel.org>, 
	Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API
 for multi-progs

On Mon, Jul 10, 2023 at 1:16 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Mon, Jul 10, 2023 at 12:00 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> >
> > On Mon, Jul 10, 2023 at 11:27 AM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Mon, Jul 10, 2023 at 11:18 AM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > >
> > > > On 07/10, Daniel Borkmann wrote:
> > > > > On 7/7/23 11:27 PM, Stanislav Fomichev wrote:
> > > > > > On 07/07, Daniel Borkmann wrote:
> > > > > [...]
> > > > > > > +static inline struct bpf_mprog_entry *
> > > > > > > +bpf_mprog_create(const size_t size, const off_t off)
> > > > > > > +{
> > > > > > > + struct bpf_mprog_bundle *bundle;
> > > > > > > + void *ptr;
> > > > > > > +
> > > > > > > + BUILD_BUG_ON(size < sizeof(*bundle) + off);
> > > > > > > + BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64));
> > > > > > > + BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) !=
> > > > > > > +              ARRAY_SIZE(bundle->cp_items));
> > > > > > > +
> > > > > > > + ptr = kzalloc(size, GFP_KERNEL);
> > > > > > > + if (ptr) {
> > > > > > > +         bundle = ptr + off;
> > > > > > > +         atomic64_set(&bundle->revision, 1);
> > > > > > > +         bundle->off = off;
> > > > > > > +         bundle->a.parent = bundle;
> > > > > > > +         bundle->b.parent = bundle;
> > > > > > > +         return &bundle->a;
> > > > > > > + }
> > > > > > > + return NULL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +void bpf_mprog_free_rcu(struct rcu_head *rcu);
> > > > > > > +
> > > > > > > +static inline void bpf_mprog_free(struct bpf_mprog_entry *entry)
> > > > > > > +{
> > > > > > > + struct bpf_mprog_bundle *bundle = entry->parent;
> > > > > > > +
> > > > > > > + call_rcu(&bundle->rcu, bpf_mprog_free_rcu);
> > > > > > > +}
> > > > > >
> > > > > > Any reason we're doing allocation here? Why not do
> > > > > > bpf_mprog_init(struct bpf_mprog_bundle *) instead that simply initializes
> > > > > > the fields? Then we can move allocation/free part to the caller (tcx) along
> > > > > > with rcu_head.
> > > > > > Feels like it would be a bit more conventional/readable? bpf_mprog_free{,_rcu}
> > > > > > will also become tcx_free{,_rcu}..
> > > > > >
> > > > > > I guess current approach works, but it took me awhile to figure it out..
> > > > > > (maybe it's just me)
> > > > >
> > > > > I found this approach quite useful for tcx case since we only fetch the
> > > > > bpf_mprog_entry for tcx_link_prog_attach et al, but I can take a look to
> > > > > see if this looks better and if it does I'll include it.
> > > > >
> > > > > > > +static inline void bpf_mprog_mark_ref(struct bpf_mprog_entry *entry,
> > > > > > > +                               struct bpf_tuple *tuple)
> > > > > > > +{
> > > > > > > + WARN_ON_ONCE(entry->parent->ref);
> > > > > > > + if (!tuple->link)
> > > > > > > +         entry->parent->ref = tuple->prog;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
> > > > > > > +{
> > > > > > > + entry->parent->count++;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
> > > > > > > +{
> > > > > > > + entry->parent->count--;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline int bpf_mprog_max(void)
> > > > > > > +{
> > > > > > > + return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
> > > > > > > +{
> > > > > > > + int total = entry->parent->count;
> > > > > > > +
> > > > > > > + WARN_ON_ONCE(total > bpf_mprog_max());
> > > > > > > + return total;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry,
> > > > > > > +                             struct bpf_prog *prog)
> > > > > > > +{
> > > > > > > + const struct bpf_mprog_fp *fp;
> > > > > > > + const struct bpf_prog *tmp;
> > > > > > > +
> > > > > > > + bpf_mprog_foreach_prog(entry, fp, tmp) {
> > > > > > > +         if (tmp == prog)
> > > > > > > +                 return true;
> > > > > > > + }
> > > > > > > + return false;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline bool bpf_mprog_swap_entries(const int code)
> > > > > > > +{
> > > > > > > + return code == BPF_MPROG_SWAP ||
> > > > > > > +        code == BPF_MPROG_FREE;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry)
> > > > > > > +{
> > > > > > > + atomic64_inc(&entry->parent->revision);
> > > > > > > + synchronize_rcu();
> > > > > >
> > > > > > Maybe add a comment on why we need to synchronize_rcu here? In general,
> > > > > > I don't think I have a good grasp of that ->ref member.
> > > > >
> > > > > Yeap, will add a comment. For the case where we delete the prog, we mark
> > > > > it in bpf_mprog_detach, but we can only drop the reference once the user
> > > > > swapped the bpf_mprog_entry and ensured that there are no in-flight users
> > > > > hence both in bpf_mprog_commit.
> > > > >
> > > > > [...]
> > > > > > > +static int bpf_mprog_prog(struct bpf_tuple *tuple,
> > > > > > > +                   u32 object, u32 flags,
> > > > > > > +                   enum bpf_prog_type type)
> > > > > > > +{
> > > > > > > + bool id = flags & BPF_F_ID;
> > > > > > > + struct bpf_prog *prog;
> > > > > > > +
> > > > > > > + if (id)
> > > > > > > +         prog = bpf_prog_by_id(object);
> > > > > > > + else
> > > > > > > +         prog = bpf_prog_get(object);
> > > > > > > + if (IS_ERR(prog)) {
> > > > > >
> > > > > > [..]
> > > > > >
> > > > > > > +         if (!object && !id)
> > > > > > > +                 return 0;
> > > > > >
> > > > > > What's the reason behind this?
> > > > >
> > > > > If an fd was passed which is 0 and this was not a program fd, then we don't error
> > > > > out and treat it as if no fd was passed.
> > > >
> > > > Is this new api an opportunity to fix that fd==0? And always treat it as
> > > > valid. Or we have some other constrains elsewhere?
> > >
> > > No. There is nothing to fix.
> >
> > Care to elaborate? Do we want to preserve it for consistency? Or is
> > there some concern with asking people to put relative_fd=-1 when doing
> > the call?
> > I'm fine either way; trying to understand where it's coming from. I
> > remember it was discussed briefly at lsfmmbpf, but don't remember the
> > details..
>
> 0 is invalid bpf object (prog, map, link). There is nothing to "fix".

It's more like it's a conditionally invalid bpf object (fd in this case) :-)

bpf_program__attach_tcx(..., { ..., relative_fd = 0, ... }); //
returns ok and doesn't use relative_fd
dup2(prog_fd, 0);
bpf_program__attach_tcx(..., { ..., relative_fd = 0, ... }); // this
will use prog_fd duped at 0

It seems like it might a bit cleaner to explicitly ask for -1:
bpf_program__attach_tcx(..., { ..., relative_fd = -1, ... });

But whatever, it works anyway, and that's how it's been done elsewhere
it seems, so I'm not gonna waste our time on it.