netdev - Re: [PATCH RFC bpf-next 0/7] Add bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <90f403df-520a-a3c0-0272-8118f9628498@mojatatu.com>
Date:   Mon, 21 Jun 2021 09:55:31 -0400
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>
Cc:     Cong Wang <xiyou.wangcong@...il.com>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>, Vlad Buslov <vladbu@...dia.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, Joe Stringer <joe@...ium.io>,
        Quentin Monnet <quentin@...valent.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Marcelo Ricardo Leitner <mleitner@...hat.com>
Subject: Re: [PATCH RFC bpf-next 0/7] Add bpf_link based TC-BPF API

On 2021-06-18 6:42 p.m., Daniel Borkmann wrote:
> On 6/18/21 1:40 PM, Jamal Hadi Salim wrote:

[..]
>  From a user interface PoV it's odd since you need to go and parse that 
> anyway, at
> least the programs typically start out with a switch/case on either 
> reading the
> skb->protocol or getting it via eth->h_proto. But then once you extend 
> that same
> program to also cover IPv6, you don't need to do anything with the 
> ETH_P_ALL
> from the loader application, but now you'd also need to additionally 
> remember to
> downgrade ETH_P_IP to ETH_P_ALL and rebuild the loader to get v6 
> traffic. But even
> if you were to split things in the main/entry program to separate v4/v6 
> processing
> into two different ones, I expect this to be faster via tail calls 
> (given direct
> absolute jump) instead of walking a list of tcf_proto objects, comparing 
> the
> tp->protocol and going into a different cls_bpf instance.
> 

Good point on being more future proof with ETH_P_ALL.
Note: In our case we were only interested in ipv4 and i dont see that
changing for the specific prog we have. From a compute perspective all
i am saving by not using ETH_P_ALL is one if statement (checking if
proto is ipv4). If you feel strongly about it we can change our code.
My worry now is if we used this approach then likely someone else in the 
wild used something similar.

I think it boils down again to: if it doesnt confuse the API or add
extra complexity why not allow it and default to ETH_P_ALL?

On your comment that a bpf based proto comparison being faster - the
issue is that the tp proto always happens regardless and ebpf, depending
on your program, may not fit all your code. Example i may actually
decide to have a program for v6 and v4 separately if i wanted
to with current mechanism - at different tc ruleset prios just
so as to work around code/complexity issues.

BTW: tail call limit of 32 provides an upper bound which affects
depth of (generic) parsing.
Does it make sense to allow (maybe on a per-boot) increasing the size?
The fact things run on the stack may be restricting.


> It may be more tricky but not impossible either, in recent years some 
> (imho) very
> interesting and exciting use cases have been implemented and talked 
> about e.g. [0-2],
> and with the recent linker work there could also be a [e.g. in-kernel] 
> collection with
> library code that can be pulled in by others aside from using them as 
> BPF selftests
> as one option. The gain you have with the flexibility [as you know] is 
> that it allows
> easy integration/orchestration into user space applications and thus 
> suitable for
> more dynamic envs as with old-style actions. The issue I have with the 
> latter is
> that they're not scalable enough from a SW datapath / tc fast-path 
> perspective given
> you then need to fallback to old-style list processing of cls+act 
> combinations which
> is also not covered / in scope for the libbpf API in terms of their 
> setup, and
> additionally not all of the BPF features can be used this way either, so 
> it'll be very
> hard for users to debug why their BPF programs don't work as they're 
> expected to.
> 
> But also aside from those blockers, the case with this clean slate tc 
> BPF API is that
> we have a unique chance to overcome the cmdline usability struggles, and 
> make it as
> straight forward as possible for new generation of users.
> 
>    [0] https://linuxplumbersconf.org/event/7/contributions/677/
>    [1] https://linuxplumbersconf.org/event/2/contributions/121/
>    [2] 
> https://netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF 

I took a quick glance at the refs.

IIUC, your message is "do more with less" i.e restrict choices now
so we can focus on optimizing for speed. Here's my experience.
We have two pragmatic challenges:

1) In a deployment, like some enterprise class data centers, we are
often limited by the kernel and often even the distro you are on. You
cant just upgrade to the latest and greatest without risking voiding
the distro vendors support contract. Big shops with a lot of geniuses
like FB and Google dont have these problems of course - but the majority
out there do.

So even our little program must use supported interfaces (ex: You cant
expect support on RH8.3 for an XDP issue without using the supplied XDP 
lib) to be accepted.

So building in support to use existing infra is useful

2) challenges with ebpf code space and code complexity: Depending
on the complexity, a program with less than 4K instructions may be
rejected by the verifier. IOW, I just cant add all the features
i need _even if i wanted to_.

For this reason working cooperatively with other existing kernel
and user infra makes sense (Ref [2] is doing that for example).
You dont want to rewrite the kernel using ebpf. Extending the kernel
with ebpf makes sense. And of course I dont want to loose performance
but there may be a trade-off sometimes where a little loss in 
performance is justified for gain of a feature makes sense
(the non-da example applies).

Perhaps adding more helpers to interface to the actions and classifiers
is one way forward.

cheers,
jamal

PS: I didnt understand the kernel linker point with BPF selftests.
Pointer?