[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzak_v01v5Y6dNT_1KAcax_hvVqZM4o+d_w5OJSWeLJz2g@mail.gmail.com>
Date: Wed, 5 Oct 2022 20:19:20 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: bpf@...r.kernel.org, razor@...ckwall.org, ast@...nel.org,
andrii@...nel.org, martin.lau@...ux.dev, john.fastabend@...il.com,
joannelkoong@...il.com, memxor@...il.com, toke@...hat.com,
joe@...ium.io, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 02/10] bpf: Implement BPF link handling for tc
BPF programs
On Tue, Oct 4, 2022 at 4:12 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> This work adds BPF links for tc. As a recap, a BPF link represents the attachment
> of a BPF program to a BPF hook point. The BPF link holds a single reference to
> keep BPF program alive. Moreover, hook points do not reference a BPF link, only
> the application's fd or pinning does. A BPF link holds meta-data specific to
> attachment and implements operations for link creation, (atomic) BPF program
> update, detachment and introspection.
>
> The motivation for BPF links for tc BPF programs is multi-fold, for example:
>
> - "It's especially important for applications that are deployed fleet-wide
> and that don't "control" hosts they are deployed to. If such application
> crashes and no one notices and does anything about that, BPF program will
> keep running draining resources or even just, say, dropping packets. We
> at FB had outages due to such permanent BPF attachment semantics. With
> fd-based BPF link we are getting a framework, which allows safe, auto-
> detachable behavior by default, unless application explicitly opts in by
> pinning the BPF link." [0]
>
> - From Cilium-side the tc BPF programs we attach to host-facing veth devices
> and phys devices build the core datapath for Kubernetes Pods, and they
> implement forwarding, load-balancing, policy, EDT-management, etc, within
> BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
> experienced hard-to-debug issues in a user's staging environment where
> another Kubernetes application using tc BPF attached to the same prio/handle
> of cls_bpf, wiping all Cilium-based BPF programs from underneath it. The
> goal is to establish a clear/safe ownership model via links which cannot
> accidentally be overridden. [1]
>
> BPF links for tc can co-exist with non-link attachments, and the semantics are
> in line also with XDP links: BPF links cannot replace other BPF links, BPF
> links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
> lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
> would solve mentioned issue of safe ownership model as 3rd party applications
> would not be able to accidentally wipe Cilium programs, even if they are not
> BPF link aware.
>
> Earlier attempts [2] have tried to integrate BPF links into core tc machinery
> to solve cls_bpf, which has been intrusive to the generic tc kernel API with
> extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
> be wiped from the qdisc also. Locking a tc BPF program in place this way, is
> getting into layering hacks given the two object models are vastly different.
> We chose to implement a prerequisite of the fd-based tc BPF attach API, so
> that the BPF link implementation fits in naturally similar to other link types
> which are fd-based and without the need for changing core tc internal APIs.
>
> BPF programs for tc can then be successively migrated from cls_bpf to the new
> tc BPF link without needing to change the program's source code, just the BPF
> loader mechanics for attaching.
>
> [0] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com/
> [1] https://lpc.events/event/16/contributions/1353/
> [2] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com/
>
> Co-developed-by: Nikolay Aleksandrov <razor@...ckwall.org>
> Signed-off-by: Nikolay Aleksandrov <razor@...ckwall.org>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> ---
have you considered supporting BPF cookie from the outset? It should
be trivial if you remove union from bpf_prog_array_item. If not, then
we should reject LINK_CREATE if bpf_cookie is non-zero.
> include/linux/bpf.h | 5 +-
> include/net/xtc.h | 14 ++++
> include/uapi/linux/bpf.h | 5 ++
> kernel/bpf/net.c | 116 ++++++++++++++++++++++++++++++---
> kernel/bpf/syscall.c | 3 +
> tools/include/uapi/linux/bpf.h | 5 ++
> 6 files changed, 139 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 71e5f43db378..226a74f65704 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1473,7 +1473,10 @@ struct bpf_prog_array_item {
> union {
> struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
> u64 bpf_cookie;
> - u32 bpf_priority;
> + struct {
> + u32 bpf_priority;
> + u32 bpf_id;
this is link_id, is that right? should we name it as such?
> + };
> };
> };
>
[...]
> diff --git a/kernel/bpf/net.c b/kernel/bpf/net.c
> index ab9a9dee615b..22b7a9b05483 100644
> --- a/kernel/bpf/net.c
> +++ b/kernel/bpf/net.c
> @@ -8,7 +8,7 @@
> #include <net/xtc.h>
>
> static int __xtc_prog_attach(struct net_device *dev, bool ingress, u32 limit,
> - struct bpf_prog *nprog, u32 prio, u32 flags)
> + u32 id, struct bpf_prog *nprog, u32 prio, u32 flags)
similarly here, id -> link_id or something like that, it's quite
confusing what kind of ID it is otherwise
> {
> struct bpf_prog_array_item *item, *tmp;
> struct xtc_entry *entry, *peer;
> @@ -27,10 +27,13 @@ static int __xtc_prog_attach(struct net_device *dev, bool ingress, u32 limit,
> if (!oprog)
> break;
> if (item->bpf_priority == prio) {
> - if (flags & BPF_F_REPLACE) {
> + if (item->bpf_id == id &&
> + (flags & BPF_F_REPLACE)) {
[...]
Powered by blists - more mailing lists