netdev - Re: [RFC bpf-next 0/7] bpf: netdev TX metadata

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKH8qBvAMKtfrZ1jdwVS2pF161UdeXPSpY4HSzKYGTYNTupmTg@mail.gmail.com>
Date: Tue, 13 Jun 2023 16:16:08 -0700
From: Stanislav Fomichev <sdf@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Toke Høiland-Jørgensen <toke@...nel.org>, 
	bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Willem de Bruijn <willemb@...gle.com>, David Ahern <dsahern@...nel.org>, 
	"Karlsson, Magnus" <magnus.karlsson@...el.com>, Björn Töpel <bjorn@...nel.org>, 
	"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>, Network Development <netdev@...r.kernel.org>
Subject: Re: [RFC bpf-next 0/7] bpf: netdev TX metadata

On Tue, Jun 13, 2023 at 3:32 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> >
> > > >> >> > --- UAPI ---
> > > >> >> >
> > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't
> > > >> >> > expose any UAPI and are implemented as tracing programs that call
> > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall
> > > >> >> > programs. The series expands device-bound infrastructure to tracing
> > > >> >> > programs.
> > > >> >>
> > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part
> > > >> >> of the XDP data path API, and I think we should expose them as proper
> > > >> >> bpf_link attachments from userspace with introspection etc. But I guess
> > > >> >> the bpf_mprog thing will give us that?
> > > >> >
> > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The
> > > >> > syscall program will still stay :-(
> > > >>
> > > >> Why does the attachment have to be done this way, exactly? Couldn't we
> > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not
> > > >> really piggy-backing on the function override thing anyway when the
> > > >> attachment is per-dev? Or am I misunderstanding how all this works?
> > > >
> > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives
> > > > us an opportunity to fix things.
> > > > We can do it via a regular syscall path if there is a consensus.
> > >
> > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If
> > > we were to at some point conclude that this whole thing was not useful
> > > at all and deprecate it, it doesn't seem to me that it makes much
> > > difference whether that means "you can no longer create a link
> > > attachment of this type via BPF_LINK_CREATE" or "you can no longer
> > > create a link attachment of this type via BPF_PROG_RUN of a syscall type
> > > program" doesn't really seem like a significant detail to me...
> >
> > In this case, why do you prefer it to go via regular syscall? Seems
> > like we can avoid a bunch of boileplate syscall work with a kfunc that
> > does the attachment?
> > We might as well abstract it at, say, libbpf layer which would
> > generate/load this small bpf program to call a kfunc.
>
> I'm not sure we're on the same page here.
> imo using syscall bpf prog that calls kfunc to do a per-device attach
> is an overkill here.
> It's an experimental feature, but you're already worried about
> multiple netdevs?
>
> Can you add an empty nop function and attach to it tracing style
> with fentry ?
> It won't be per-netdev, but do you have to do per-device demux
> by the kernel? Can your tracing bpf prog do that instead?
> It's just an ifindex compare.
> This way than non-uapi bits will be even smaller and no need
> to change struct netdevice.

It's probably going to work if each driver has a separate set of tx
fentry points, something like:
  {veth,mlx5,etc}_devtx_submit()
  {veth,mlx5,etc}_devtx_complete()
Because I still need to have those netdev-bound tracing programs to
get access to driver kfuncs.

I can try to sketch something together for a v2.