[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <075ea7b6-08f3-c334-1140-e2a24669aed2@iogearbox.net>
Date: Mon, 10 Jul 2023 15:15:15 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: ast@...nel.org, andrii@...nel.org, martin.lau@...ux.dev,
razor@...ckwall.org, sdf@...gle.com, john.fastabend@...il.com,
kuba@...nel.org, dxu@...uu.xyz, joe@...ium.io, toke@...nel.org,
davem@...emloft.net, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API
for multi-progs
On 7/10/23 9:10 AM, Daniel Borkmann wrote:
> On 7/9/23 7:17 PM, Alexei Starovoitov wrote:
>> On Fri, Jul 07, 2023 at 07:24:48PM +0200, Daniel Borkmann wrote:
>>> +
>>> +#define BPF_MPROG_KEEP 0
>>> +#define BPF_MPROG_SWAP 1
>>> +#define BPF_MPROG_FREE 2
>>
>> Please document how this is suppose to be used.
>> Patch 2 is using BPF_MPROG_FREE in tcx_entry_needs_release().
>> Where most of the code treats BPF_MPROG_SWAP and BPF_MPROG_FREE as equivalent.
>> I can guess what it's for, but a comment would help.
>
> Ok, sounds good, will add a comment to these codes.
>
[...]
>> In the future, for cgroups, bpf_prog_run_array_cg() will keep explicit rcu_read_lock()
>> before accessing bpf_mprog_entry, right?
>> And bpf_mprog_commit() assumes that RCU protection.
>
> Both yes.
>
>> All fine, but we need to document that mprog mechanism is not suitable for sleepable progs.
>
> Ok, I'll add a comment.
I've added this as comment for bpf_mprog.h to address the ret codes, locking
and usage example :
/*
* bpf_mprog framework:
* ~~~~~~~~~~~~~~~~~~~~
*
* bpf_mprog is a generic layer for multi-program attachment. In-kernel users
* of the bpf_mprog don't need to care about the dependency resolution
* internals, they can just consume it with few API calls. Currently available
* dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of
* a BPF program or BPF link relative to an existing BPF program or BPF link
* inside the multi-program array as well as prepend and append behavior if
* no relative object was specified, see corresponding selftests for concrete
* examples (e.g. tc_links and tc_opts test cases of test_progs).
*
* Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code:
*
* Attach case:
*
* struct bpf_mprog_entry *entry, *peer;
* int ret;
*
* // bpf_mprog user-side lock
* // fetch active @entry from attach location
* [...]
* ret = bpf_mprog_attach(entry, [...]);
* if (ret >= 0) {
* peer = bpf_mprog_peer(entry);
* if (bpf_mprog_swap_entries(ret))
* // swap @entry to @peer at attach location
* bpf_mprog_commit(entry);
* ret = 0;
* } else {
* // error path, bail out, propagate @ret
* }
* // bpf_mprog user-side unlock
*
* Detach case:
*
* struct bpf_mprog_entry *entry, *peer;
* bool release;
* int ret;
*
* // bpf_mprog user-side lock
* // fetch active @entry from attach location
* [...]
* ret = bpf_mprog_detach(entry, [...]);
* if (ret >= 0) {
* release = ret == BPF_MPROG_FREE;
* peer = release ? NULL : bpf_mprog_peer(entry);
* if (bpf_mprog_swap_entries(ret))
* // swap entry to @peer at attach location
* bpf_mprog_commit(entry);
* if (release)
* // free bpf_mprog_bundle
* ret = 0;
* } else {
* // error path, bail out, propagate @ret
* }
* // bpf_mprog user-side unlock
*
* Query case:
*
* struct bpf_mprog_entry *entry;
* int ret;
*
* // bpf_mprog user-side lock
* // fetch active @entry from attach location
* [...]
* ret = bpf_mprog_query(attr, uattr, entry);
* // bpf_mprog user-side unlock
*
* Data/fast path:
*
* struct bpf_mprog_entry *entry;
* struct bpf_mprog_fp *fp;
* struct bpf_prog *prog;
* int ret = [...];
*
* rcu_read_lock();
* // fetch active @entry from attach location
* [...]
* bpf_mprog_foreach_prog(entry, fp, prog) {
* ret = bpf_prog_run(prog, [...]);
* // process @ret from program
* }
* [...]
* rcu_read_unlock();
*
* bpf_mprog_{attach,detach}() return codes:
*
* Negative return code means that an error occurred and the bpf_mprog_entry
* has not been changed. The error should be propagated to the user. A non-
* negative return code can be one of the following:
*
* BPF_MPROG_KEEP:
* The bpf_mprog_entry does not need a/b swap, the bpf_mprog_fp item has
* been replaced in the current active bpf_mprog_entry.
*
* BPF_MPROG_SWAP:
* The bpf_mprog_entry does need an a/b swap and must be updated to its
* peer entry (peer = bpf_mprog_peer(entry)) which has been populated to
* the new bpf_mprog_fp item configuration.
*
* BPF_MPROG_FREE:
* The bpf_mprog_entry now does not hold any non-NULL bpf_mprog_fp items
* anymore. The bpf_mprog_entry should be swapped with NULL and the
* corresponding bpf_mprog_bundle can be freed.
*
* bpf_mprog locking considerations:
*
* bpf_mprog_{attach,detach,query}() must be protected by an external lock
* (like RTNL in case of tcx).
*
* bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx
* the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets
* updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of
* the bpf_mprog_bundle.
*
* Fast path accesses the active bpf_mprog_entry within RCU critical section
* (in case of tcx it runs in NAPI which provides RCU protection there,
* other users might need explicit rcu_read_lock()). The bpf_mprog_commit()
* assumes that RCU protection.
*
* The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for
* the replacement case where we don't swap the bpf_mprog_entry.
*/
Hope that helps,
Daniel
Powered by blists - more mailing lists