[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f8d6a17-5755-06f7-b754-33238359883e@iogearbox.net>
Date: Fri, 14 Jul 2023 18:06:08 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: ast@...nel.org, andrii@...nel.org, martin.lau@...ux.dev,
razor@...ckwall.org, sdf@...gle.com, john.fastabend@...il.com,
kuba@...nel.org, dxu@...uu.xyz, joe@...ium.io, toke@...nel.org,
davem@...emloft.net, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API
for multi-progs
On 7/11/23 8:51 PM, Andrii Nakryiko wrote:
> On Mon, Jul 10, 2023 at 5:23 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
>>
>> On Mon, Jul 10, 2023 at 10:12:11PM +0200, Daniel Borkmann wrote:
>>> + *
>>> + * struct bpf_mprog_entry *entry, *peer;
>>> + * int ret;
>>> + *
>>> + * // bpf_mprog user-side lock
>>> + * // fetch active @entry from attach location
>>> + * [...]
>>> + * ret = bpf_mprog_attach(entry, [...]);
>>> + * if (ret >= 0) {
>>> + * peer = bpf_mprog_peer(entry);
>>> + * if (bpf_mprog_swap_entries(ret))
>>> + * // swap @entry to @peer at attach location
>>> + * bpf_mprog_commit(entry);
>>> + * ret = 0;
>>> + * } else {
>>> + * // error path, bail out, propagate @ret
>>> + * }
>>> + * // bpf_mprog user-side unlock
>>> + *
>>> + * Detach case:
>>> + *
>>> + * struct bpf_mprog_entry *entry, *peer;
>>> + * bool release;
>>> + * int ret;
>>> + *
>>> + * // bpf_mprog user-side lock
>>> + * // fetch active @entry from attach location
>>> + * [...]
>>> + * ret = bpf_mprog_detach(entry, [...]);
>>> + * if (ret >= 0) {
>>> + * release = ret == BPF_MPROG_FREE;
>>> + * peer = release ? NULL : bpf_mprog_peer(entry);
>>> + * if (bpf_mprog_swap_entries(ret))
>>> + * // swap @entry to @peer at attach location
>>> + * bpf_mprog_commit(entry);
>>> + * if (release)
>>> + * // free bpf_mprog_bundle
>>> + * ret = 0;
>>> + * } else {
>>> + * // error path, bail out, propagate @ret
>>> + * }
>>> + * // bpf_mprog user-side unlock
>>
>> Thanks for the doc. It helped a lot.
>> And when it's contained like this it's easier to discuss api.
>> It seems bpf_mprog_swap_entries() is trying to abstract the error code
>> away, but BPF_MPROG_FREE leaks out and tcx_entry_needs_release()
>> captures it with extra miniq_active twist, which I don't understand yet.
>> bpf_mprog_peer() is also leaking a bit of implementation detail.
>> Can we abstract it further, like:
>>
>> ret = bpf_mprog_detach(entry, [...], &new_entry);
>> if (ret >= 0) {
>> if (entry != new_entry)
>> // swap @entry to @new_entry at attach location
>> bpf_mprog_commit(entry);
>> if (!new_entry)
>> // free bpf_mprog_bundle
>> }
>> and make bpf_mprog_peer internal to mprog. It will also allow removing
>> BPF_MPROG_FREE vs SWAP distinction. peer is hidden.
>> if (entry != new_entry)
>> // update
>> also will be easier to read inside tcx code without looking into mprog details.
+1, agree, and I implemented this suggestion in the v5.
> I'm actually thinking if it's possible to simplify it even further.
> For example, do we even need a separate bpf_mprog_{attach,detach} and
> bpf_mprog_commit()? So far it seems like bpf_mprog_commit() is
> inevitable in case of success of attach/detach, so we might as well
> just do it as the last step of attach/detach operation.
It needs to be done after the pointers have been swapped by the mprog user.
> The only problem seems to be due to bpf_mprog interface doing this
> optimization of replacing stuff in place, if possible, and allowing
> the caller to not do the swap. How important is it to avoid that swap
> of a bpf_mprog_fp (pointer)? Seems pretty cheap (and relatively rare
> operation), so I wouldn't bother optimizing this.
I would like to keep it given e.g. when application comes up, fetches links
from bpffs and updates all its programs in place, then this is the replace
situation.
> So how about we just say that there is always a swap. Internally in
> bpf_mprog_bundle current entry is determined based on revision&1. We
> can have bpf_mprog_cur_entry() to return a proper pointer after
> commit. Or bpf_mprog_attach() can return proper new entry as output
> parameter, whichever is preferable.
>
> As for BPF_MPROG_FREE. That seems like an unnecessary complication as
> well. Caller can just check bpf_mprog_total() quickly, and if it
> dropped to zero assume FREE. Unless there is something more subtle
> there?
Agree, some may want to keep an empty bpf_mprog, others may want to
free it. I implemented it this way. I removed all the BPF_MPROG_*
return codes.
> With the above, the interface will be much simpler, IMO. You just do
> bpf_mprog_attach/detach, and then swap pointer to new bpf_mprog_entry.
> Then you can check bpf_mprog_total() for zero, and clean up further,
> if necessary.
>
> We assume the caller has a proper locking, so all the above should be non-racy.
>
> BTW, combining commit with attach allows us to avoid that relatively
> big bpf_mprog_cp array on the stack as well, because we will be able
> to update bundle->cp_items in-place.
>
> The only (I believe :) ) big assumption I'm making in all of the above
> is that commit is inevitable and we won't have a situation where we
> start attach, update fp/cpp, and then decide to abort instead of going
> for commit. Is this possible? Can we avoid it by careful checks
> upfront and doing attach as last step that cannot be undone?
>
> P.S. I guess one bit that I might have simplified is that
> synchronize_rcu() + bpf_prog_put(), but I'm not sure exactly why we
> put prog after sync_rcu. But if it's really necessary (and I assume it
It is because users can still be inflight on the old mprog_entry so it
must come after the sync rcu where we drop ref for the delete case.
Thanks again,
Daniel
Powered by blists - more mailing lists