netdev - Re: [PATCH bpf-next 02/10] bpf: Implement BPF link handling for tc BPF programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5451abc2-3364-80bd-f7ae-9cff2052bad9@iogearbox.net>
Date:   Thu, 6 Oct 2022 22:54:05 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     bpf@...r.kernel.org, razor@...ckwall.org, ast@...nel.org,
        andrii@...nel.org, martin.lau@...ux.dev, john.fastabend@...il.com,
        joannelkoong@...il.com, memxor@...il.com, toke@...hat.com,
        joe@...ium.io, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 02/10] bpf: Implement BPF link handling for tc
 BPF programs

On 10/6/22 5:19 AM, Andrii Nakryiko wrote:
> On Tue, Oct 4, 2022 at 4:12 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>>
>> This work adds BPF links for tc. As a recap, a BPF link represents the attachment
>> of a BPF program to a BPF hook point. The BPF link holds a single reference to
>> keep BPF program alive. Moreover, hook points do not reference a BPF link, only
>> the application's fd or pinning does. A BPF link holds meta-data specific to
>> attachment and implements operations for link creation, (atomic) BPF program
>> update, detachment and introspection.
>>
>> The motivation for BPF links for tc BPF programs is multi-fold, for example:
>>
>> - "It's especially important for applications that are deployed fleet-wide
>>     and that don't "control" hosts they are deployed to. If such application
>>     crashes and no one notices and does anything about that, BPF program will
>>     keep running draining resources or even just, say, dropping packets. We
>>     at FB had outages due to such permanent BPF attachment semantics. With
>>     fd-based BPF link we are getting a framework, which allows safe, auto-
>>     detachable behavior by default, unless application explicitly opts in by
>>     pinning the BPF link." [0]
>>
>> -  From Cilium-side the tc BPF programs we attach to host-facing veth devices
>>     and phys devices build the core datapath for Kubernetes Pods, and they
>>     implement forwarding, load-balancing, policy, EDT-management, etc, within
>>     BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
>>     experienced hard-to-debug issues in a user's staging environment where
>>     another Kubernetes application using tc BPF attached to the same prio/handle
>>     of cls_bpf, wiping all Cilium-based BPF programs from underneath it. The
>>     goal is to establish a clear/safe ownership model via links which cannot
>>     accidentally be overridden. [1]
>>
>> BPF links for tc can co-exist with non-link attachments, and the semantics are
>> in line also with XDP links: BPF links cannot replace other BPF links, BPF
>> links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
>> lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
>> would solve mentioned issue of safe ownership model as 3rd party applications
>> would not be able to accidentally wipe Cilium programs, even if they are not
>> BPF link aware.
>>
>> Earlier attempts [2] have tried to integrate BPF links into core tc machinery
>> to solve cls_bpf, which has been intrusive to the generic tc kernel API with
>> extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
>> be wiped from the qdisc also. Locking a tc BPF program in place this way, is
>> getting into layering hacks given the two object models are vastly different.
>> We chose to implement a prerequisite of the fd-based tc BPF attach API, so
>> that the BPF link implementation fits in naturally similar to other link types
>> which are fd-based and without the need for changing core tc internal APIs.
>>
>> BPF programs for tc can then be successively migrated from cls_bpf to the new
>> tc BPF link without needing to change the program's source code, just the BPF
>> loader mechanics for attaching.
>>
>>    [0] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com/
>>    [1] https://lpc.events/event/16/contributions/1353/
>>    [2] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com/
>>
>> Co-developed-by: Nikolay Aleksandrov <razor@...ckwall.org>
>> Signed-off-by: Nikolay Aleksandrov <razor@...ckwall.org>
>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
>> ---
> 
> have you considered supporting BPF cookie from the outset? It should
> be trivial if you remove union from bpf_prog_array_item. If not, then
> we should reject LINK_CREATE if bpf_cookie is non-zero.

Haven't considered it yet at this point, but we can add this in subsequent step,
agree, thus we should reject for now upon create.

>>   include/linux/bpf.h            |   5 +-
>>   include/net/xtc.h              |  14 ++++
>>   include/uapi/linux/bpf.h       |   5 ++
>>   kernel/bpf/net.c               | 116 ++++++++++++++++++++++++++++++---
>>   kernel/bpf/syscall.c           |   3 +
>>   tools/include/uapi/linux/bpf.h |   5 ++
>>   6 files changed, 139 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 71e5f43db378..226a74f65704 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1473,7 +1473,10 @@ struct bpf_prog_array_item {
>>          union {
>>                  struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
>>                  u64 bpf_cookie;
>> -               u32 bpf_priority;
>> +               struct {
>> +                       u32 bpf_priority;
>> +                       u32 bpf_id;
> 
> this is link_id, is that right? should we name it as such?

Ack, will rename, thanks also for all your other suggestions inthe various patches,
all make sense to me & will address them!

>> +               };
>>          };
>>   };
>>
> 
> [...]