netdev - Re: [PATCH bpf-next 1/8] meta, bpf: Add bpf programmable meta device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-L2YgVeB=99kK4OzZR7fF=hJM5QBi3Ld=Xdct0q4tDMag@mail.gmail.com>
Date: Thu, 28 Sep 2023 14:01:11 +0200
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Toke Høiland-Jørgensen <toke@...nel.org>
Cc: Daniel Borkmann <daniel@...earbox.net>, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	martin.lau@...nel.org, razor@...ckwall.org, ast@...nel.org, andrii@...nel.org, 
	john.fastabend@...il.com
Subject: Re: [PATCH bpf-next 1/8] meta, bpf: Add bpf programmable meta device

On Thu, Sep 28, 2023 at 11:17 AM Toke Høiland-Jørgensen <toke@...nel.org> wrote:
>
> Daniel Borkmann <daniel@...earbox.net> writes:
>
> > This work adds a new, minimal BPF-programmable device called "meta" we
> > recently presented at LSF/MM/BPF. The latter name derives from the Greek
> > μετά, encompassing a wide array of meanings such as "on top of", "beyond".
> > Given business logic is defined by BPF, this device can have many meanings.
> > The core idea is that BPF programs are executed within the drivers xmit
> > routine and therefore e.g. in case of containers/Pods moving BPF processing
> > closer to the source.
>
> I like the concept, but I think we should change the name (as I believe
> I also mentioned back when you presented it at LSF/MM/BPF). I know this
> is basically bikeshedding, but I nevertheless think it is important, for
> a couple of reasons:
>
> - As you say, meta has a specific meaning, and this device is not a
>   "meta" device in the common sense of the word: it is not tied to other
>   devices (so it's not 'on top of' anything), and it is not "about"
>   anything (as in metadata). It is just a device type that is programmed
>   by BPF, so let's call it that.
>
> - It's not discoverable; how are people supposed to figure out that they
>   should go look for a 'meta' device? We also already have multiple
>   things called 'metadata', so this is just going to create even more
>   confusion (as we also discussed in relation to 'xdp hints').
>
> - It squats on a pretty widely used term throughout the kernel
>   (CONFIG_META, 'meta' as the module name). This is related to the above
>   point; seeing something named 'meta' in lsmod, the natural assumption
>   wouldn't be that it's a network driver.
>
> I think we should just name the driver 'bpfnet'; it's not pretty, but
> it's obvious and descriptive. Optionally we could teach 'ip' to
> understand just 'bpf' as the device type, so you could go 'ip link add
> type bpf' and get one of these.

+1

> > One of the goals was that in case of Pod egress traffic, this allows to
> > move BPF programs from hostns tcx ingress into the device itself, providing
> > earlier drop or forward mechanisms, for example, if the BPF program
> > determines that the skb must be sent out of the node, then a redirect to
> > the physical device can take place directly without going through per-CPU
> > backlog queue. This helps to shift processing for such traffic from softirq
> > to process context, leading to better scheduling decisions and better
> > performance.
>
> So my only reservation to having this tied to a BPF-only device like
> this is basically that if this is indeed such a big win, shouldn't we
> try to make the stack operate in this mode by default? I assume you did
> the analysis of what it would take to change veth to operate in this
> mode; so what was the reason you decided to create a new device type
> instead?
>
> (I seem to recall at the presentation that you made a general reference
> to veth being 'too complex', but complexity can be managed, so I'm more
> thinking about whether there's any specific reason why changing veth
> wouldn't work at all?)

If one point is queuing packets on the softnet queue, I think it
should be fine to call netif_receive_skb instead of netif_rx, at least
for single device depth.