netdev - Re: [PATCH bpf-next v2 0/7] Add bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2f98188b-813b-e226-4962-5c2848998af2@iogearbox.net>
Date:   Fri, 10 Jun 2022 22:35:49 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Toke Høiland-Jørgensen <toke@...hat.com>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Joanne Koong <joannelkoong@...il.com>
Cc:     bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Vlad Buslov <vladbu@...dia.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API

On 6/10/22 10:16 PM, Toke Høiland-Jørgensen wrote:
> Kumar Kartikeya Dwivedi <memxor@...il.com> writes:
> 
>> On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
>>> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@...il.com> wrote:
>>>>
>>>> On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
>>>> <memxor@...il.com> wrote:
>>>>>
>>>>> On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
>>>>>> On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
>>>>>> <memxor@...il.com> wrote:
>>>>>>>
>>>>>>> This is the second (non-RFC) version.
>>>>>>>
>>>>>>> This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
>>>>>>> introduces fd based ownership for such TC filters. Netlink cannot delete or
>>>>>>> replace such filters, but the bpf_link is severed on indirect destruction of the
>>>>>>> filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
>>>>>>> that filters remain attached beyond process lifetime, the usual bpf_link fd
>>>>>>> pinning approach can be used.
>>>>>>>
>>>>>>> The individual patches contain more details and comments, but the overall kernel
>>>>>>> API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
>>>>>>> merged recently. This means that we start by always setting direct action mode,
>>>>>>> protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
>>>>>>> options in the future, they can be easily exposed through the bpf_link API in
>>>>>>> the future.
>>>>>>>
>>>>>>> Patch 1 refactors cls_bpf change function to extract two helpers that will be
>>>>>>> reused in bpf_link creation.
>>>>>>>
>>>>>>> Patch 2 exports some bpf_link management functions to modules. This is needed
>>>>>>> because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
>>>>>>> tcf_proto would be weird, because the update path has to replace offloaded bpf
>>>>>>> prog, which happens using internal cls_bpf helpers, and would in general be more
>>>>>>> code to abstract over an operation that is unlikely to be implemented for other
>>>>>>> filter types.
>>>>>>>
>>>>>>> Patch 3 adds the main bpf_link API. A function in cls_api takes care of
>>>>>>> obtaining block reference, creating the filter object, and then calls the
>>>>>>> bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
>>>>>>> setting up the internal structures. An optimization is made to not keep around
>>>>>>> resources for extended actions, which is explained in a code comment as it wasn't
>>>>>>> immediately obvious.
>>>>>>>
>>>>>>> Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
>>>>>>> replacing the bpf_prog, we can skip tc filter's change path by reusing the
>>>>>>> filter object but swapping its bpf_prog. This takes care of replacing the
>>>>>>> offloaded prog as well (if that fails, update is aborted). So far however,
>>>>>>> tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
>>>>>>> would never be modified concurrently. This is no longer true, and to not
>>>>>>> penalize the classify hot path, we also cannot impose serialization around
>>>>>>> its load. Hence the load is changed to READ_ONCE, so that the pointer value is
>>>>>>> always consistent. Due to invocation in a RCU critical section, the lifetime of
>>>>>>> the prog is guaranteed for the duration of the call.
>>>>>>>
>>>>>>> Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
>>>>>>> function to libbpf.
>>>>>>>
>>>>>>> Patch 7 adds a selftest that exercises all possible problematic interactions
>>>>>>> that I could think of.
>>>>>>>
>>>>>>> Design:
>>>>>>>
>>>>>>> This is where in the object hierarchy our bpf_link object is attached.
>>>>>>>
>>>>>>>                                                                              ┌─────┐
>>>>>>>                                                                              │     │
>>>>>>>                                                                              │ BPF │
>>>>>>>                                                                              program
>>>>>>>                                                                              │     │
>>>>>>>                                                                              └──▲──┘
>>>>>>>                                                        ┌───────┐                │
>>>>>>>                                                        │       │         ┌──────┴───────┐
>>>>>>>                                                        │  mod  ├─────────► cls_bpf_prog │
>>>>>>> ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
>>>>>>> │    tcf_block   │                                    │       │              │   │
>>>>>>> └────────┬───────┘                                    └───▲───┘              │   │
>>>>>>>           │          ┌─────────────┐                       │                ┌─▼───┴──┐
>>>>>>>           └──────────►  tcf_chain  │                       │                │bpf_link│
>>>>>>>                      └───────┬─────┘                       │                └────────┘
>>>>>>>                              │          ┌─────────────┐    │
>>>>>>>                              └──────────►  tcf_proto  ├────┘
>>>>>>>                                         └─────────────┘
>>>>>>>
>>>>>>> The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
>>>>>>> allows us to implement update in a lightweight manner without having to recreate
>>>>>>> a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
>>>>>>>
>>>>>>> The other way to do it would be to link the bpf_link to tcf_proto, there are
>>>>>>> numerous downsides to this:
>>>>>>>
>>>>>>> 1. All filters have to embed the pointer even though they won't be using it when
>>>>>>> cls_bpf is compiled in.
>>>>>>> 2. This probably won't make sense to be extended to other filter types anyway.
>>>>>>> 3. We aren't able to optimize the update case without adding another bpf_link
>>>>>>> specific update operation to tcf_proto ops.
>>>>>>>
>>>>>>> The downside with tying this to the module is having to export bpf_link
>>>>>>> management functions and introducing a tcf_proto op. Hopefully the cost of
>>>>>>> another operation func pointer is not big enough (as there is only one ops
>>>>>>> struct per module).
>>>>>>>
>>>>>> Hi Kumar,
>>>>>>
>>>>>> Do you have any plans / bandwidth to land this feature upstream? If
>>>>>> so, do you have a tentative estimation for when you'll be able to work
>>>>>> on this? And if not, are you okay with someone else working on this to
>>>>>> get it merged in?
>>>>>>
>>>>>
>>>>> I can have a look at resurrecting it later this month, if you're ok with waiting
>>>>> until then, otherwise if someone else wants to pick this up before that it's
>>>>> fine by me, just let me know so we avoid duplicated effort. Note that the
>>>>> approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
>>>>> have to implement the way Daniel mentioned in [0].
>>>>
>>>> Sounds great! We'll wait and check back in with you later this month.
>>>>
>>> After reading the linked thread (which I should have done before
>>> submitting my previous reply :)),  if I'm understanding it correctly,
>>> it seems then that the work needed for tc bpf_link will be in a new
>>> direction that's not based on the code in this v2 patchset. I'm
>>> interested in learning more about bpf link and tc - I can pick this up
>>> to work on. But if this was something you wanted to work on though,
>>> please don't hesitate to let me know; I can find some other bpf link
>>> thing to work on instead if that's the case.
>>
>> Feel free to take it. And yes, it's going to be much simpler than this. I think
>> you can just add two bpf_prog pointers in struct net_device, use rtnl_lock to
>> protect the updates, and invoke using bpf_prog_run in sch_handle_ingress and
>> sch_handle_egress.
> 
> Except we'd want to also support multiple programs on different
> priorities? I don't think requiring a libxdp-like dispatcher to achieve
> this is a good idea if we can just have it be part of the API from the
> get-go...

Yes, it will be multi-prog to avoid a situation where dispatcher is needed.