netdev - Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API for multi-progs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <305f79d1-fbe0-6a57-991c-0f79679d62d6@iogearbox.net>
Date: Mon, 10 Jul 2023 09:10:52 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: ast@...nel.org, andrii@...nel.org, martin.lau@...ux.dev,
 razor@...ckwall.org, sdf@...gle.com, john.fastabend@...il.com,
 kuba@...nel.org, dxu@...uu.xyz, joe@...ium.io, toke@...nel.org,
 davem@...emloft.net, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API
 for multi-progs

On 7/9/23 7:17 PM, Alexei Starovoitov wrote:
> On Fri, Jul 07, 2023 at 07:24:48PM +0200, Daniel Borkmann wrote:
>> +
>> +#define BPF_MPROG_KEEP	0
>> +#define BPF_MPROG_SWAP	1
>> +#define BPF_MPROG_FREE	2
> 
> Please document how this is suppose to be used.
> Patch 2 is using BPF_MPROG_FREE in tcx_entry_needs_release().
> Where most of the code treats BPF_MPROG_SWAP and BPF_MPROG_FREE as equivalent.
> I can guess what it's for, but a comment would help.

Ok, sounds good, will add a comment to these codes.

>> +#define BPF_MPROG_MAX	64
> 
> we've seen buggy user space attaching thousands of tc progs to the same netdev.
> I suspect 64 limit will be hit much sooner than expected.

In that sense it's probably good that we'll bail out rather than draining memory
as you had with the cls_bpf case, iirc. As I mentioned given this is not uapi, we
can adapt this further in future when needed if you have cases where you attach
more than 64 progs for a single device.

>> +#define bpf_mprog_foreach_tuple(entry, fp, cp, t)			\
>> +	for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
>> +	     ({								\
>> +		t.prog = READ_ONCE(fp->prog);				\
>> +		t.link = cp->link;					\
>> +		t.prog;							\
>> +	      });							\
>> +	     fp++, cp++)
>> +
>> +#define bpf_mprog_foreach_prog(entry, fp, p)				\
>> +	for (fp = &entry->fp_items[0];					\
>> +	     (p = READ_ONCE(fp->prog));					\
>> +	     fp++)
> 
> I have similar questions to Stanislav.

The READ_ONCE/WRITE_ONCE is for the replacement case where we don't need to swap
the whole array, so I annotated all access to fp->prog.

> Looks like update/delete/query of bpf_prog should be protected by an external lock
> (like RTNL in case of tcx),

Correct for tcx it's rtnl, other users also either have to piggyback on existing
locking or need their own.

> but what are the life time rules for 'entry'?
> Looking at patch 2 sch_handle_ingress():
> struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
> I suspect the assumption is that bpf_mprog_entry object should be accessed within
> RCU critical section. Since tc/tcx and XDP run in napi we have RCU protection there.

Correct.

> In the future, for cgroups, bpf_prog_run_array_cg() will keep explicit rcu_read_lock()
> before accessing bpf_mprog_entry, right?
> And bpf_mprog_commit() assumes that RCU protection.

Both yes.

> All fine, but we need to document that mprog mechanism is not suitable for sleepable progs.

Ok, I'll add a comment.

>> +	if (flags & BPF_F_BEFORE) {
>> +		tidx = bpf_mprog_pos_before(entry, &rtuple);
>> +		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
>> +			ret = tidx < -1 ? tidx : -EDOM;
>> +			goto out;
>> +		}
>> +		idx = tidx;
>> +	}
>> +	if (flags & BPF_F_AFTER) {
>> +		tidx = bpf_mprog_pos_after(entry, &rtuple);
>> +		if (tidx < 0 || (idx >= -1 && tidx != idx)) {
> 
> tidx < 0 vs tidx < -1 for _after vs _before.
> Does it have to have this subtle difference?
> Can _after and _before have the same semantics for return value?

Yes, they can. In 'after' tidx will never be -1, but I can adapt the condition
so it's the same for the two to avoid confusion.

>> +			ret = tidx < 0 ? tidx : -EDOM;
>> +			goto out;
>> +		}
>> +		idx = tidx;
>> +	}
>