netdev - Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKH8qBtawUTjFQ=hhTzXa2zTBwOpxurjhduxZV+eUg8rnJUJVw@mail.gmail.com>
Date: Tue, 11 Jul 2023 16:24:54 -0700
From: Stanislav Fomichev <sdf@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net, 
	andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org, yhs@...com, 
	john.fastabend@...il.com, kpsingh@...nel.org, haoluo@...gle.com, 
	jolsa@...nel.org, kuba@...nel.org, toke@...nel.org, willemb@...gle.com, 
	dsahern@...nel.org, magnus.karlsson@...el.com, bjorn@...nel.org, 
	maciej.fijalkowski@...el.com, hawk@...nel.org, netdev@...r.kernel.org, 
	xdp-hints@...-project.net
Subject: Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs

On Tue, Jul 11, 2023 at 3:57 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Fri, Jul 07, 2023 at 12:30:01PM -0700, Stanislav Fomichev wrote:
> > +
> > +static int mlx5e_devtx_request_l4_checksum(const struct devtx_ctx *_ctx,
> > +                                        u16 csum_start, u16 csum_offset)
> > +{
> > +     const struct mlx5e_devtx_ctx *ctx = (void *)_ctx;
> > +     struct mlx5_wqe_eth_seg *eseg;
> > +
> > +     if (unlikely(!ctx->wqe))
> > +             return -ENODATA;
> > +
> > +     eseg = &ctx->wqe->eth;
> > +
> > +     switch (csum_offset) {
> > +     case sizeof(struct ethhdr) + sizeof(struct iphdr) + offsetof(struct udphdr, check):
> > +     case sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + offsetof(struct udphdr, check):
> > +             /* Looks like HW/FW is doing parsing, so offsets are largely ignored. */
> > +             eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
> > +             break;
> > +     default:
> > +             return -EINVAL;
> > +     }
>
> I think this proves my point: csum is not generalizable even across veth and mlx5.
> Above is a square peg that tries to fit csum_start/offset api (that makes sense from SW pov)
> into HW that has different ideas about csum-ing.
>
> Here is what mlx5 does:
> mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>                             struct mlx5e_accel_tx_state *accel,
>                             struct mlx5_wqe_eth_seg *eseg)
> {
>         if (unlikely(mlx5e_ipsec_txwqe_build_eseg_csum(sq, skb, eseg)))
>                 return;
>
>         if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
>                 eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
>                 if (skb->encapsulation) {
>                         eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
>                                           MLX5_ETH_WQE_L4_INNER_CSUM;
>                         sq->stats->csum_partial_inner++;
>                 } else {
>                         eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
>                         sq->stats->csum_partial++;
>                 }
>
> How would you generalize that into csum api that will work across NICs ?
>
> My answer stands: you cannot.
>
> My proposal again:
> add driver specifc kfuncs and hooks for things like csum.

I do see your point, but to also give you my perspective: I have no
clue what those _CSUM tx bits do (as a non-mlx employee). And what
kind of packets they support (initial patch doesn't give any info).
We can definitely expose mlx5 specific request_l4_checksum(bool encap)
which does things similar to mlx5e_txwqe_build_eseg_csum, but then,
what does it _actually_ do? It obviously can't checksum arbitrary
packet formats (because it has this inner/outer selection bit), so
there is really no way for me to provide a per-driver kfunc api. Maybe
the vendors can?

So having csum_start/csum_offset abstraction which works with fixed
offsets seems like at least it correctly sets the expectation for BPF
program writers.
The vendors are already supposed to conform to this start/offset API for skb.

But back to your point: should we maybe try to meet somewhere in the middle?
1. We try to provide "generic" offload kfuncs; for mlx5, we'll have
this mlx5e_devtx_request_l4_checksum which works for fixed offsets
2. We also let vendors do device-specific "extensions" where devices
deviate too much: bpf_request_RAW_mlx5e_l4_checksum(bool encap)
This can give BPF authors opportunity to write somewhat portable
programs and also use vendor specific apis when/if needed.

I think we had a similar idea for rx side: have generic kfuncs, but
also let vendors experiment with custom kfuncs if they want to
differentiate.
WDYT? Can it give us the best things from both sides?

> Kuba,
> since you nacked driver specific stuff please suggest a way to unblock this stalemate.