lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ8uoz2Bx3cd7braAZjZFNYfqX0JjJzSvr4RBN=j8CiH8Ld5-w@mail.gmail.com>
Date: Fri, 16 Jun 2023 10:12:49 +0200
From: Magnus Karlsson <magnus.karlsson@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Toke Høiland-Jørgensen <toke@...nel.org>, 
	bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, 
	martin.lau@...ux.dev, song@...nel.org, yhs@...com, john.fastabend@...il.com, 
	kpsingh@...nel.org, haoluo@...gle.com, jolsa@...nel.org, willemb@...gle.com, 
	dsahern@...nel.org, magnus.karlsson@...el.com, bjorn@...nel.org, 
	maciej.fijalkowski@...el.com, netdev@...r.kernel.org
Subject: Re: [RFC bpf-next 0/7] bpf: netdev TX metadata

On Fri, 16 Jun 2023 at 02:09, Stanislav Fomichev <sdf@...gle.com> wrote:
>
> On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@...nel.org> wrote:
> >
> > Some immediate thoughts after glancing through this:
> >
> > > --- Use cases ---
> > >
> > > The goal of this series is to add two new standard-ish places
> > > in the transmit path:
> > >
> > > 1. Right before the packet is transmitted (with access to TX
> > >    descriptors)
> > > 2. Right after the packet is actually transmitted and we've received the
> > >    completion (again, with access to TX completion descriptors)
> > >
> > > Accessing TX descriptors unlocks the following use-cases:
> > >
> > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to
> > > use device offloads. The existing case implements TX timestamp.
> > > - Observability: global per-netdev hooks can be used for tracing
> > > the packets and exploring completion descriptors for all sorts of
> > > device errors.
> > >
> > > Accessing TX descriptors also means that the hooks have to be called
> > > from the drivers.
> > >
> > > The hooks are a light-weight alternative to XDP at egress and currently
> > > don't provide any packet modification abilities. However, eventually,
> > > can expose new kfuncs to operate on the packet (or, rather, the actual
> > > descriptors; for performance sake).
> >
> > dynptr?
> >
> > > --- UAPI ---
> > >
> > > The hooks are implemented in a HID-BPF style. Meaning they don't
> > > expose any UAPI and are implemented as tracing programs that call
> > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall
> > > programs. The series expands device-bound infrastructure to tracing
> > > programs.
> >
> > Not a fan of the "attach from BPF syscall program" thing. These are part
> > of the XDP data path API, and I think we should expose them as proper
> > bpf_link attachments from userspace with introspection etc. But I guess
> > the bpf_mprog thing will give us that?
> >
> > > --- skb vs xdp ---
> > >
> > > The hooks operate on a new light-weight devtx_frame which contains:
> > > - data
> > > - len
> > > - sinfo
> > >
> > > This should allow us to have a unified (from BPF POW) place at TX
> > > and not be super-taxing (we need to copy 2 pointers + len to the stack
> > > for each invocation).
> >
> > Not sure what I think about this one. At the very least I think we
> > should expose xdp->data_meta as well. I'm not sure what the use case for
> > accessing skbs is? If that *is* indeed useful, probably there will also
> > end up being a use case for accessing the full skb?
>
> I spent some time looking at data_meta story on AF_XDP TX and it
> doesn't look like it's supported (at least in a general way).
> You obviously get some data_meta when you do XDP_TX, but if you want
> to pass something to the bpf prog when doing TX via the AF_XDP ring,
> it gets complicated.

When we designed this some 5 - 6 years ago, we thought that there
would be an XDP for egress action in the "nearish" future that could
be used to interpret the metadata field in front of the packet.
Basically, the user would load an XDP egress program that would define
the metadata layout by the operations it would perform on the metadata
area. But since XDP on egress has not happened, you are right, there
is definitely something missing to be able to use metadata on Tx. Or
could your proposed hook points be used for something like this?

> In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG
> and pass something in the headroom.

This feature is mainly used to allow for multiple packets on the same
chunk (to save space) and also to be able to have packets spanning two
chunks. Even in aligned mode, you can start a packet at an arbitrary
address in the chunk as long as the whole packet fits into the chunk.
So no problem having headroom in any of the modes.


> If copy-mode, there is no support to do skb_metadata_set.
>
> Probably makes sense to have something like tx_metalen on the xsk? And
> skb_metadata_set it in copy more and skip it in zerocopy mode?
> Or maybe I'm missing something?
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ