netdev - Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bza_X30yLPm0Lhy2c-u1Qw1Ci9AVoy5jo_XXCaT9zz+3jg@mail.gmail.com>
Date: Tue, 11 Jul 2023 11:48:44 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: ast@...nel.org, andrii@...nel.org, martin.lau@...ux.dev, 
	razor@...ckwall.org, sdf@...gle.com, john.fastabend@...il.com, 
	kuba@...nel.org, dxu@...uu.xyz, joe@...ium.io, toke@...nel.org, 
	davem@...emloft.net, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API
 for multi-progs

On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> This adds a generic layer called bpf_mprog which can be reused by different
> attachment layers to enable multi-program attachment and dependency resolution.
> In-kernel users of the bpf_mprog don't need to care about the dependency
> resolution internals, they can just consume it with few API calls.
>
> The initial idea of having a generic API sparked out of discussion [0] from an
> earlier revision of this work where tc's priority was reused and exposed via
> BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
> as-is for classic tc BPF. The feedback was that priority provides a bad user
> experience and is hard to use [1], e.g.:
>
>   I cannot help but feel that priority logic copy-paste from old tc, netfilter
>   and friends is done because "that's how things were done in the past". [...]
>   Priority gets exposed everywhere in uapi all the way to bpftool when it's
>   right there for users to understand. And that's the main problem with it.
>
>   The user don't want to and don't need to be aware of it, but uapi forces them
>   to pick the priority. [...] Your cover letter [0] example proves that in
>   real life different service pick the same priority. They simply don't know
>   any better. Priority is an unnecessary magic that apps _have_ to pick, so
>   they just copy-paste and everyone ends up using the same.
>
> The course of the discussion showed more and more the need for a generic,
> reusable API where the "same look and feel" can be applied for various other
> program types beyond just tc BPF, for example XDP today does not have multi-
> program support in kernel, but also there was interest around this API for
> improving management of cgroup program types. Such common multi-program
> management concept is useful for BPF management daemons or user space BPF
> applications coordinating internally about their attachments.
>
> Both from Cilium and Meta side [2], we've collected the following requirements
> for a generic attach/detach/query API for multi-progs which has been implemented
> as part of this work:
>
>   - Support prog-based attach/detach and link API
>   - Dependency directives (can also be combined):
>     - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
>       - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
>         space application does not need CAP_SYS_ADMIN to retrieve foreign fds
>         via bpf_*_get_fd_by_id()
>       - BPF_F_LINK flag as {prog,link} toggle
>       - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
>         BPF_F_AFTER will just append for attaching
>       - Enforced only at attach time
>     - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
>       own infra for replacing their internal prog
>     - If no flags are set, then it's default append behavior for attaching
>   - Internal revision counter and optionally being able to pass expected_revision
>   - User space application can query current state with revision, and pass it
>     along for attachment to assert current state before doing updates
>   - Query also gets extension for link_ids array and link_attach_flags:
>     - prog_ids are always filled with program IDs
>     - link_ids are filled with link IDs when link was used, otherwise 0
>     - {prog,link}_attach_flags for holding {prog,link}-specific flags
>   - Must be easy to integrate/reuse for in-kernel users
>
> The uapi-side changes needed for supporting bpf_mprog are rather minimal,
> consisting of the additions of the attachment flags, revision counter, and
> expanding existing union with relative_{fd,id} member.
>
> The bpf_mprog framework consists of an bpf_mprog_entry object which holds
> an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
> structure) is part of bpf_mprog_bundle. Both have been separated, so that
> fast-path gets efficient packing of bpf_prog pointers for maximum cache
> efficiency. Also, array has been chosen instead of linked list or other
> structures to remove unnecessary indirections for a fast point-to-entry in
> tc for BPF.
>
> The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
> updates the peer bpf_mprog_entry is populated and then just swapped which
> avoids additional allocations that could otherwise fail, for example, in
> detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
> be converted to dynamic allocation if necessary at a point in future.
> Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
> of tcx which uses this API in the next patch, it piggybacks on rtnl.
>
> An extensive test suite for checking all aspects of this API for prog-based
> attach/detach and link API comes as BPF selftests in this series.
>
> Kudos also to Andrii Nakryiko for API discussions wrt Meta's BPF management.
>
>   [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net
>   [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
>   [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> ---
>  MAINTAINERS                    |   1 +
>  include/linux/bpf_mprog.h      | 343 ++++++++++++++++++++++++++
>  include/uapi/linux/bpf.h       |  36 ++-
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/mprog.c             | 427 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  36 ++-
>  6 files changed, 828 insertions(+), 17 deletions(-)
>  create mode 100644 include/linux/bpf_mprog.h
>  create mode 100644 kernel/bpf/mprog.c
>

>From UAPI perspective looks great! Few implementation suggestion
below. I'll also reply separately to Alexei's reply with discussion on
higher-level *internal* API.

[...]

> +
> +#define BPF_MPROG_KEEP 0
> +#define BPF_MPROG_SWAP 1
> +#define BPF_MPROG_FREE 2
> +
> +#define BPF_MPROG_MAX  64
> +
> +#define bpf_mprog_foreach_tuple(entry, fp, cp, t)                      \
> +       for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
> +            ({                                                         \
> +               t.prog = READ_ONCE(fp->prog);                           \
> +               t.link = cp->link;                                      \
> +               t.prog;                                                 \
> +             });                                                       \
> +            fp++, cp++)

I wish we could do something like the below to avoid the need to pass
fp and cp from outside:

for (struct { struct bpf_mprog_fp *fp; struct bpf_mprog_cp *cp;} tmp =
     { &entry->fp_items[0], &entry->parent->cp_iterms[0]};
     t.link = tmp.cp->link, t.prog = READ_ONCE(tmp.fp->prog);
     fp++, cp++)

But I'm not sure the kernel's C style allows that yet.

But I think you can use the comma operator to avoid that more verbose
({ }) construct.

> +
> +#define bpf_mprog_foreach_prog(entry, fp, p)                           \
> +       for (fp = &entry->fp_items[0];                                  \
> +            (p = READ_ONCE(fp->prog));                                 \
> +            fp++)
> +

[...]

> +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
> +{
> +       entry->parent->count++;
> +}
> +
> +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
> +{
> +       entry->parent->count--;
> +}
> +
> +static inline int bpf_mprog_max(void)
> +{
> +       return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
> +}

so we can only add BPF_MPROG_MAX - 1 progs, right? I presume the last
entry is presumed to be always NULL, right?

> +
> +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
> +{
> +       int total = entry->parent->count;
> +
> +       WARN_ON_ONCE(total > bpf_mprog_max());
> +       return total;
> +}
> +

[...]

> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 1d3892168d32..1bea2eb912cd 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
>  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
>  obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
> -obj-$(CONFIG_BPF_SYSCALL) += disasm.o
> +obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
>  obj-$(CONFIG_BPF_JIT) += trampoline.o
>  obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
>  obj-$(CONFIG_BPF_JIT) += dispatcher.o
> diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
> new file mode 100644
> index 000000000000..1c4fcde74969
> --- /dev/null
> +++ b/kernel/bpf/mprog.c
> @@ -0,0 +1,427 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Isovalent */
> +
> +#include <linux/bpf.h>
> +#include <linux/bpf_mprog.h>
> +
> +static int bpf_mprog_link(struct bpf_tuple *tuple,
> +                         u32 object, u32 flags,

so I tried to get used to this "object" notation, but I think it's
still awkwards and keeps me asking "what is this really" every single
time I read this. I wonder if something like "fd_or_id" as a name
would make it more obvious?

> +                         enum bpf_prog_type type)
> +{
> +       bool id = flags & BPF_F_ID;
> +       struct bpf_link *link;
> +

should we reject this object/fd_or_id if it's zero, instead of trying
to lookup ID/FD 0?


> +       if (id)
> +               link = bpf_link_by_id(object);
> +       else
> +               link = bpf_link_get_from_fd(object);
> +       if (IS_ERR(link))
> +               return PTR_ERR(link);
> +       if (type && link->prog->type != type) {
> +               bpf_link_put(link);
> +               return -EINVAL;
> +       }
> +
> +       tuple->link = link;
> +       tuple->prog = link->prog;
> +       return 0;
> +}
> +
> +static int bpf_mprog_prog(struct bpf_tuple *tuple,
> +                         u32 object, u32 flags,
> +                         enum bpf_prog_type type)
> +{
> +       bool id = flags & BPF_F_ID;
> +       struct bpf_prog *prog;
> +

same here about rejecting zero object?

> +       if (id)
> +               prog = bpf_prog_by_id(object);
> +       else
> +               prog = bpf_prog_get(object);
> +       if (IS_ERR(prog)) {
> +               if (!object && !id)
> +                       return 0;
> +               return PTR_ERR(prog);
> +       }
> +       if (type && prog->type != type) {
> +               bpf_prog_put(prog);
> +               return -EINVAL;
> +       }
> +
> +       tuple->link = NULL;
> +       tuple->prog = prog;
> +       return 0;
> +}
> +
> +static int bpf_mprog_tuple_relative(struct bpf_tuple *tuple,
> +                                   u32 object, u32 flags,
> +                                   enum bpf_prog_type type)
> +{
> +       memset(tuple, 0, sizeof(*tuple));
> +       if (flags & BPF_F_LINK)
> +               return bpf_mprog_link(tuple, object, flags, type);
> +       return bpf_mprog_prog(tuple, object, flags, type);
> +}
> +
> +static void bpf_mprog_tuple_put(struct bpf_tuple *tuple)
> +{
> +       if (tuple->link)
> +               bpf_link_put(tuple->link);
> +       else if (tuple->prog)
> +               bpf_prog_put(tuple->prog);
> +}
> +
> +static int bpf_mprog_replace(struct bpf_mprog_entry *entry,
> +                            struct bpf_tuple *ntuple, int idx)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       struct bpf_prog *oprog;
> +
> +       bpf_mprog_read(entry, idx, &fp, &cp);
> +       oprog = READ_ONCE(fp->prog);
> +       bpf_mprog_write(fp, cp, ntuple);
> +       if (!ntuple->link) {
> +               WARN_ON_ONCE(cp->link);
> +               bpf_prog_put(oprog);
> +       }
> +       return BPF_MPROG_KEEP;
> +}
> +
> +static int bpf_mprog_insert(struct bpf_mprog_entry *entry,
> +                           struct bpf_tuple *ntuple, int idx, u32 flags)
> +{
> +       int i, j = 0, total = bpf_mprog_total(entry);
> +       struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};

a bit worried about using 512 bytes for local cpp array... my initial
assumption was that we won't have to create a copy of cp_iterms, just
update it in place. Hm... let's have the higher-level API discussion
in one branch, where Alexei has some proposals as well.

> +       struct bpf_mprog_fp *fp, *fpp;
> +       struct bpf_mprog_entry *peer;
> +
> +       peer = bpf_mprog_peer(entry);
> +       bpf_mprog_entry_clear(peer);
> +       if (idx < 0) {
> +               bpf_mprog_read_fp(peer, j, &fpp);
> +               bpf_mprog_write_fp(fpp, ntuple);
> +               bpf_mprog_write_cp(&cpp[j], ntuple);
> +               j++;
> +       }
> +       for (i = 0; i <= total; i++) {
> +               bpf_mprog_read_fp(peer, j, &fpp);
> +               if (idx == i && (flags & BPF_F_AFTER)) {
> +                       bpf_mprog_write(fpp, &cpp[j], ntuple);
> +                       j++;
> +                       bpf_mprog_read_fp(peer, j, &fpp);
> +               }
> +               if (i < total) {
> +                       bpf_mprog_read(entry, i, &fp, &cp);
> +                       bpf_mprog_copy(fpp, &cpp[j], fp, cp);
> +                       j++;
> +               }
> +               if (idx == i && (flags & BPF_F_BEFORE)) {
> +                       bpf_mprog_read_fp(peer, j, &fpp);
> +                       bpf_mprog_write(fpp, &cpp[j], ntuple);
> +                       j++;
> +               }
> +       }

sorry if I miss some subtle point, but I wonder why this is so
complicated? I think this choice of idx == -1 meaning prepend is
leading to this complication. It's not also clear why there is this
BPF_F_AFTER vs BPF_F_BEFORE distinction when we already determined a
position where new program has to be inserted (so after or before
should be irrelevant).

Please let me know why the below doesn't work.

Let's define that idx is the position where new prog/link tuple has to
be inserted. It can be in the range [0, N], where N is number of
programs currently in the mprog_peer. Note that N is inclusive above.

The algorithm for insertion is simple: everything currently at
entry->fp_items[idx] and after gets shifted. And we can do it with a
simple memmove:

memmove(peer->fp_items + idx + 1, peer->fp_iters + idx,
(bpf_mprog_total(entry) - idx) * sizeof(struct bpf_mprof_fp));
/* similar memmove for cp_items/cpp array, of course */
/* now set new prog at peer->fp_items[idx] */

The above should replace entire above for loop and that extra if
before the loop. And it should work for corner cases:

  - idx == 0 (prepend), will shift everything to the right, and put
new prog at position 0. Exactly what we wanted.
  - idx == N (append), will shift nothing (that memmov should be a
no-op because size is zero, total == idx == N)


We just need to make sure that the above shift won't overwrite the
very last NULL. So bpf_mprog_total() should be < BPF_MPROG_MAX - 2
before all this.

Seems as simple as that, is there any complication I skimmed over?


> +       bpf_mprog_commit_cp(peer, cpp);
> +       bpf_mprog_inc(peer);
> +       return BPF_MPROG_SWAP;
> +}
> +
> +static int bpf_mprog_tuple_confirm(struct bpf_mprog_entry *entry,
> +                                  struct bpf_tuple *dtuple, int idx)
> +{
> +       int first = 0, last = bpf_mprog_total(entry) - 1;
> +       struct bpf_mprog_cp *cp;
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_prog *prog;
> +       struct bpf_link *link;
> +
> +       if (idx <= first)
> +               bpf_mprog_read(entry, first, &fp, &cp);
> +       else if (idx >= last)
> +               bpf_mprog_read(entry, last, &fp, &cp);
> +       else
> +               bpf_mprog_read(entry, idx, &fp, &cp);
> +
> +       prog = READ_ONCE(fp->prog);
> +       link = cp->link;
> +       if (!dtuple->link && link)
> +               return -EBUSY;
> +
> +       WARN_ON_ONCE(dtuple->prog && dtuple->prog != prog);
> +       WARN_ON_ONCE(dtuple->link && dtuple->link != link);
> +
> +       dtuple->prog = prog;
> +       dtuple->link = link;
> +       return 0;
> +}
> +
> +static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
> +                           struct bpf_tuple *dtuple, int idx)
> +{
> +       int i = 0, j, ret, total = bpf_mprog_total(entry);
> +       struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};
> +       struct bpf_mprog_fp *fp, *fpp;
> +       struct bpf_mprog_entry *peer;
> +
> +       ret = bpf_mprog_tuple_confirm(entry, dtuple, idx);
> +       if (ret)
> +               return ret;
> +       peer = bpf_mprog_peer(entry);
> +       bpf_mprog_entry_clear(peer);
> +       if (idx < 0)
> +               i++;
> +       if (idx == total)
> +               total--;
> +       for (j = 0; i < total; i++) {
> +               if (idx == i)
> +                       continue;
> +               bpf_mprog_read_fp(peer, j, &fpp);
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               bpf_mprog_copy(fpp, &cpp[j], fp, cp);
> +               j++;
> +       }
> +       bpf_mprog_commit_cp(peer, cpp);
> +       bpf_mprog_dec(peer);
> +       bpf_mprog_mark_ref(peer, dtuple);
> +       return bpf_mprog_total(peer) ?
> +              BPF_MPROG_SWAP : BPF_MPROG_FREE;

for delete it's also a bit unclear to me. We are deleting some
specific spot, so idx should be a valid [0, N) value, no? Then why the
bpf_mprog_tuple_confirm() has this special <= first and idx >= last
handling?

Deletion should be similar to instertion, just the shift is in the
other direction. And then setting NULLs at N-1 position to ensure
proper NULL termination of fp array.

> +}
> +
> +/* In bpf_mprog_pos_*() we evaluate the target position for the BPF
> + * program/link that needs to be replaced, inserted or deleted for
> + * each "rule" independently. If all rules agree on that position
> + * or existing element, then enact replacement, addition or deletion.
> + * If this is not the case, then the request cannot be satisfied and
> + * we bail out with an error.
> + */
> +static int bpf_mprog_pos_exact(struct bpf_mprog_entry *entry,
> +                              struct bpf_tuple *tuple)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       int i;
> +
> +       for (i = 0; i < bpf_mprog_total(entry); i++) {
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               if (tuple->prog == READ_ONCE(fp->prog))
> +                       return tuple->link == cp->link ? i : -EBUSY;
> +       }
> +       return -ENOENT;
> +}
> +
> +static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry,
> +                               struct bpf_tuple *tuple)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       int i;
> +
> +       for (i = 0; i < bpf_mprog_total(entry); i++) {
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               if (tuple->prog == READ_ONCE(fp->prog) &&
> +                   (!tuple->link || tuple->link == cp->link))
> +                       return i - 1;

taking all the above into account, this should just `return i;`

> +       }
> +       return tuple->prog ? -ENOENT : -1;
> +}
> +
> +static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry,
> +                              struct bpf_tuple *tuple)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       int i;
> +
> +       for (i = 0; i < bpf_mprog_total(entry); i++) {
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               if (tuple->prog == READ_ONCE(fp->prog) &&
> +                   (!tuple->link || tuple->link == cp->link))
> +                       return i + 1;
> +       }
> +       return tuple->prog ? -ENOENT : bpf_mprog_total(entry);
> +}

I actually wonder if it would be simpler to not have _exact, _before,
and _after variant. Instead do generic find of a tuple. And then
outside of that, depending on BPF_F_BEFORE/BPF_F_AFTER/BPF_F_REPLACE
just adjust returned position (if item is found) to either keep it as
is for BPF_F_BEFORE and BPF_F_REPLACE, or adjust it +1 for BPF_F_AFTER

> +
> +int bpf_mprog_attach(struct bpf_mprog_entry *entry, struct bpf_prog *prog_new,
> +                    struct bpf_link *link, struct bpf_prog *prog_old,
> +                    u32 flags, u32 object, u64 revision)
> +{
> +       struct bpf_tuple rtuple, ntuple = {
> +               .prog = prog_new,
> +               .link = link,
> +       }, otuple = {
> +               .prog = prog_old,
> +               .link = link,
> +       };
> +       int ret, idx = -2, tidx;

so here I'd init idx to some "impossible" error, like -ERANGE (to pair
with -EDOM ;)

> +
> +       if (revision && revision != bpf_mprog_revision(entry))
> +               return -ESTALE;
> +       if (bpf_mprog_exists(entry, prog_new))
> +               return -EEXIST;
> +       ret = bpf_mprog_tuple_relative(&rtuple, object,
> +                                      flags & ~BPF_F_REPLACE,
> +                                      prog_new->type);
> +       if (ret)
> +               return ret;
> +       if (flags & BPF_F_REPLACE) {
> +               tidx = bpf_mprog_pos_exact(entry, &otuple);
> +               if (tidx < 0) {
> +                       ret = tidx;
> +                       goto out;
> +               }
> +               idx = tidx;
> +       }
> +       if (flags & BPF_F_BEFORE) {
> +               tidx = bpf_mprog_pos_before(entry, &rtuple);
> +               if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> +                       ret = tidx < -1 ? tidx : -EDOM;
> +                       goto out;
> +               }
> +               idx = tidx;
> +       }
> +       if (flags & BPF_F_AFTER) {
> +               tidx = bpf_mprog_pos_after(entry, &rtuple);
> +               if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> +                       ret = tidx < 0 ? tidx : -EDOM;
> +                       goto out;
> +               }
> +               idx = tidx;

and then here just have special casing for -ERANGE, and otherwise
treat anything else negative as error

tidx = bpf_mprog_pos_exact(entry, &rtuple);
/* and adjust +1 for BPF_F_AFTER */
if (tidx >= 0)
    tidx += 1;
if (idx != -ERANGE && tidx != idx) {
    ret = tidx < 0 ? tidx : -EDOM;
    goto out;
}
idx = tidx;

> +       }
> +       if (idx < -1) {
> +               if (rtuple.prog || flags) {
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +               idx = bpf_mprog_total(entry);
> +               flags = BPF_F_AFTER;
> +       }
> +       if (idx >= bpf_mprog_max()) {
> +               ret = -EDOM;
> +               goto out;
> +       }
> +       if (flags & BPF_F_REPLACE)
> +               ret = bpf_mprog_replace(entry, &ntuple, idx);
> +       else
> +               ret = bpf_mprog_insert(entry, &ntuple, idx, flags);
> +out:
> +       bpf_mprog_tuple_put(&rtuple);
> +       return ret;
> +}
> +

[...]