netdev - Re: [RFC bpf-next 0/7] bpf: netdev TX metadata

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQ+CCOw9_LbCAaFz0593eydKNb7RxnGr6_FatUOKmvPmBg@mail.gmail.com>
Date: Tue, 13 Jun 2023 21:19:45 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Toke Høiland-Jørgensen <toke@...nel.org>, 
	bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Willem de Bruijn <willemb@...gle.com>, David Ahern <dsahern@...nel.org>, 
	"Karlsson, Magnus" <magnus.karlsson@...el.com>, Björn Töpel <bjorn@...nel.org>, 
	"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>, Network Development <netdev@...r.kernel.org>
Subject: Re: [RFC bpf-next 0/7] bpf: netdev TX metadata

On Tue, Jun 13, 2023 at 4:16 PM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> On Tue, Jun 13, 2023 at 3:32 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > >
> > > > >> >> > --- UAPI ---
> > > > >> >> >
> > > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't
> > > > >> >> > expose any UAPI and are implemented as tracing programs that call
> > > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall
> > > > >> >> > programs. The series expands device-bound infrastructure to tracing
> > > > >> >> > programs.
> > > > >> >>
> > > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part
> > > > >> >> of the XDP data path API, and I think we should expose them as proper
> > > > >> >> bpf_link attachments from userspace with introspection etc. But I guess
> > > > >> >> the bpf_mprog thing will give us that?
> > > > >> >
> > > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The
> > > > >> > syscall program will still stay :-(
> > > > >>
> > > > >> Why does the attachment have to be done this way, exactly? Couldn't we
> > > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not
> > > > >> really piggy-backing on the function override thing anyway when the
> > > > >> attachment is per-dev? Or am I misunderstanding how all this works?
> > > > >
> > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives
> > > > > us an opportunity to fix things.
> > > > > We can do it via a regular syscall path if there is a consensus.
> > > >
> > > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If
> > > > we were to at some point conclude that this whole thing was not useful
> > > > at all and deprecate it, it doesn't seem to me that it makes much
> > > > difference whether that means "you can no longer create a link
> > > > attachment of this type via BPF_LINK_CREATE" or "you can no longer
> > > > create a link attachment of this type via BPF_PROG_RUN of a syscall type
> > > > program" doesn't really seem like a significant detail to me...
> > >
> > > In this case, why do you prefer it to go via regular syscall? Seems
> > > like we can avoid a bunch of boileplate syscall work with a kfunc that
> > > does the attachment?
> > > We might as well abstract it at, say, libbpf layer which would
> > > generate/load this small bpf program to call a kfunc.
> >
> > I'm not sure we're on the same page here.
> > imo using syscall bpf prog that calls kfunc to do a per-device attach
> > is an overkill here.
> > It's an experimental feature, but you're already worried about
> > multiple netdevs?
> >
> > Can you add an empty nop function and attach to it tracing style
> > with fentry ?
> > It won't be per-netdev, but do you have to do per-device demux
> > by the kernel? Can your tracing bpf prog do that instead?
> > It's just an ifindex compare.
> > This way than non-uapi bits will be even smaller and no need
> > to change struct netdevice.
>
> It's probably going to work if each driver has a separate set of tx
> fentry points, something like:
>   {veth,mlx5,etc}_devtx_submit()
>   {veth,mlx5,etc}_devtx_complete()

Right. And per-driver descriptors.
The 'generic' xdptx metadata is unlikely to be practical.
Marshaling in and out of it is going to be too perf sensitive.
I'd just add an attach point in the driver with enough
args for bpf progs to make sense of the context and extend
the verifier to make few safe fields writeable.
kfuncs to read/request timestamp are probably too slow.