lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-Kf6wSc1JkgpNHEBVbyRiJ1pHqbw7SkkuHGAHatyS+eVg@mail.gmail.com>
Date: Wed, 12 Jul 2023 15:11:23 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Stanislav Fomichev <sdf@...gle.com>, bpf <bpf@...r.kernel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, 
	Yonghong Song <yhs@...com>, John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Jakub Kicinski <kuba@...nel.org>, 
	Toke Høiland-Jørgensen <toke@...nel.org>, 
	Willem de Bruijn <willemb@...gle.com>, David Ahern <dsahern@...nel.org>, 
	"Karlsson, Magnus" <magnus.karlsson@...el.com>, Björn Töpel <bjorn@...nel.org>, 
	"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	Network Development <netdev@...r.kernel.org>, xdp-hints@...-project.net
Subject: Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs

On Wed, Jul 12, 2023 at 3:03 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Wed, Jul 12, 2023 at 11:16:04AM -0400, Willem de Bruijn wrote:
> > On Wed, Jul 12, 2023 at 1:36 AM Stanislav Fomichev <sdf@...gle.com> wrote:
> > >
> > > On Tue, Jul 11, 2023 at 9:59 PM Alexei Starovoitov
> > > <alexei.starovoitov@...il.com> wrote:
> > > >
> > > > On Tue, Jul 11, 2023 at 8:29 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > >
> > > > >
> > > > > This will slow things down, but not to the point where it's on par
> > > > > with doing sw checksum. At least in theory.
> > > > > We can't stay at skb when using AF_XDP. AF_XDP would benefit from having
> > > > > the offloads.
> > > >
> > > > To clarify: yes, AF_XDP needs generalized HW offloads.
> > >
> > > Great! To reiterate, I'm mostly interested in af_xdp wrt tx
> > > timestamps. So if the consensus is not to mix xdp-tx and af_xdp-tx,
> > > I'm fine with switching to adding some fixed af_xdp descriptor format
> > > to enable offloads on tx.
>
> since af_xdp is a primary user let's figure out what is the best api for that.
> If any code can be salvaged for xdp tx, great, but let's not start with xdp tx
> as prerequisite.
>
> > >
> > > > I just don't see how xdp tx offloads are moving a needle in that direction.
> > >
> > > Let me try to explain how both might be similar, maybe I wasn't clear
> > > enough on that.
> > > For af_xdp tx packet, the userspace puts something in the af_xdp frame
> > > metadata area (headrom) which then gets executed/interpreted by the
> > > bpf program at devtx (which calls kfuncs to enable particular
> > > offloads).
> > > IOW, instead of defining some fixed layout for the tx offloads, the
> > > userspace and bpf program have some agreement on the layout (and bpf
> > > program "applies" the offloads by calling the kfuncs).
> > > Also (in theory) the same hooks can be used for xdp-tx.
> > > Does it make sense? But, again, happy to scratch that whole idea if
> > > we're fine with a fixed layout for af_xdp.
>
> So instead of defining csum offload format in xsk metadata we'll
> defining it as a set of arguments to a kfunc and tx-side xsk prog
> will just copy the args from metadata into kfunc args ?
> Seems like an unnecesary step. Such xsk prog won't be doing
> anything useful. Just copying from one place to another.
> It seems the only purpose of such bpf prog is to side step uapi exposure.
> bpf is not used to program anything. There won't be any control flow.
> Just odd intermediate copy step.
> Instead we can define a metadata struct for csum nic offload
> outside of uapi/linux/if_xdp.h with big 'this is not an uapi' warning.
> User space can request it via setsockopt.
> And probably feature query the nic via getsockopt.
>
> Error handling is critical here. With xsk tx prog the errors
> are messy. What to do when kfunc returns error? Store it back into
> packet metadata ? and then user space needs to check every single
> packet for errors? Not practical imo.
>
> Feature query via getsockopt would be done once instead and
> user space will fill in "csum offload struct" in packet metadata
> and won't check per-packet error. If driver said the csum feature
> is available it's better work for every packet.
> Notice mlx5e_txwqe_build_eseg_csum() returns void.
>
> >
> > Checksum offload is an important demonstrator too.
> >
> > It is admittedly a non-trivial one. Checksum offload has often been
> > discussed as a pain point ("protocol ossification").
> >
> > In general, drivers can accept every CHECKSUM_COMPLETE skb that
> > matches their advertised feature NETIF_F_[HW|IP|IPV6]_CSUM. I don't
> > see why this would be different for kfuncs for packets coming from
> > userspace.
> >
> > The problematic drivers are the ones that do not implement
> > CHECKSUM_COMPLETE as intended, but ignore this simple
> > protocol-independent hint in favor of parsing from scratch, possibly
> > zeroing the field, computing multiple layers, etc.
> >
> > All of which is unnecessary with LCO. An AF_XDP user can be expected
> > to apply LCO and only request checksum insertion for the innermost
> > checksum.
> >
> > The biggest problem is with these devices that parse in hardware (and
> > possibly also in the driver to identify and fix up hardware
> > limitations) is that they will fail if encountering an unknown
> > protocol. Which brings us to advertising limited typed support:
> > NETIF_F_HW_CSUM vs NETIF_F_IP_CSUM.
> >
> > The fact that some devices that deviate from industry best practices
> > cannot support more advanced packet formats is unfortunate, but not a
> > reason to hold others back. No different from current kernel path. The
> > BPF program can fallback onto software checksumming on these devices,
> > like the kernel path. Perhaps we do need to pass along with csum_start
> > and csum_off a csum_type that matches the existing
> > NETIF_F_[HW|IP|IPV6]_CSUM, to let drivers return with -EOPNOTSUPP
> > quickly if for the generic case.
> >
> > For implementation in essence it is just reordering driver code that
> > already exists for the skb case. I think the ice patch series to
> > support rx timestamping is a good indication of what it takes to
> > support XDP kfuncs: not so much new code, but reordering the driver
> > logic.
> >
> > Which also indicates to me that the driver *is* the right place to
> > implement this logic, rather than reimplement it in a BPF library. It
> > avoids both code duplication and dependency hell, if the library ships
> > independent from the driver.
>
> Agree with all of the above.
> I think defining CHECKSUM_PARTIAL struct request for af_xdp is doable and
> won't require much changes in the drivers.
> If we do it for more than one driver from the start there is a chance it
> will work for other drivers too. imo ice+gve+mlx5 would be enough.

Basically, add to AF_XDP what we already have for its predecessor
AF_PACKET: setsockopt PACKET_VNET_HDR?

Possibly with a separate new struct, rather than virtio_net_hdr. As
that has dependencies on other drivers, notably virtio and its
specification process.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ