[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230709171728.gonedzieinilrvra@MacBook-Pro-8.local>
Date: Sun, 9 Jul 2023 10:17:28 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: ast@...nel.org, andrii@...nel.org, martin.lau@...ux.dev,
razor@...ckwall.org, sdf@...gle.com, john.fastabend@...il.com,
kuba@...nel.org, dxu@...uu.xyz, joe@...ium.io, toke@...nel.org,
davem@...emloft.net, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API
for multi-progs
On Fri, Jul 07, 2023 at 07:24:48PM +0200, Daniel Borkmann wrote:
> +
> +#define BPF_MPROG_KEEP 0
> +#define BPF_MPROG_SWAP 1
> +#define BPF_MPROG_FREE 2
Please document how this is suppose to be used.
Patch 2 is using BPF_MPROG_FREE in tcx_entry_needs_release().
Where most of the code treats BPF_MPROG_SWAP and BPF_MPROG_FREE as equivalent.
I can guess what it's for, but a comment would help.
> +
> +#define BPF_MPROG_MAX 64
we've seen buggy user space attaching thousands of tc progs to the same netdev.
I suspect 64 limit will be hit much sooner than expected.
> +
> +#define bpf_mprog_foreach_tuple(entry, fp, cp, t) \
> + for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
> + ({ \
> + t.prog = READ_ONCE(fp->prog); \
> + t.link = cp->link; \
> + t.prog; \
> + }); \
> + fp++, cp++)
> +
> +#define bpf_mprog_foreach_prog(entry, fp, p) \
> + for (fp = &entry->fp_items[0]; \
> + (p = READ_ONCE(fp->prog)); \
> + fp++)
I have similar questions to Stanislav.
Looks like update/delete/query of bpf_prog should be protected by an external lock
(like RTNL in case of tcx),
but what are the life time rules for 'entry'?
Looking at patch 2 sch_handle_ingress():
struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
I suspect the assumption is that bpf_mprog_entry object should be accessed within
RCU critical section. Since tc/tcx and XDP run in napi we have RCU protection there.
In the future, for cgroups, bpf_prog_run_array_cg() will keep explicit rcu_read_lock()
before accessing bpf_mprog_entry, right?
And bpf_mprog_commit() assumes that RCU protection.
All fine, but we need to document that mprog mechanism is not suitable for sleepable progs.
> + if (flags & BPF_F_BEFORE) {
> + tidx = bpf_mprog_pos_before(entry, &rtuple);
> + if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> + ret = tidx < -1 ? tidx : -EDOM;
> + goto out;
> + }
> + idx = tidx;
> + }
> + if (flags & BPF_F_AFTER) {
> + tidx = bpf_mprog_pos_after(entry, &rtuple);
> + if (tidx < 0 || (idx >= -1 && tidx != idx)) {
tidx < 0 vs tidx < -1 for _after vs _before.
Does it have to have this subtle difference?
Can _after and _before have the same semantics for return value?
> + ret = tidx < 0 ? tidx : -EDOM;
> + goto out;
> + }
> + idx = tidx;
> + }
Powered by blists - more mailing lists