[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBtawUTjFQ=hhTzXa2zTBwOpxurjhduxZV+eUg8rnJUJVw@mail.gmail.com>
Date: Tue, 11 Jul 2023 16:24:54 -0700
From: Stanislav Fomichev <sdf@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org, yhs@...com,
john.fastabend@...il.com, kpsingh@...nel.org, haoluo@...gle.com,
jolsa@...nel.org, kuba@...nel.org, toke@...nel.org, willemb@...gle.com,
dsahern@...nel.org, magnus.karlsson@...el.com, bjorn@...nel.org,
maciej.fijalkowski@...el.com, hawk@...nel.org, netdev@...r.kernel.org,
xdp-hints@...-project.net
Subject: Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs
On Tue, Jul 11, 2023 at 3:57 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Fri, Jul 07, 2023 at 12:30:01PM -0700, Stanislav Fomichev wrote:
> > +
> > +static int mlx5e_devtx_request_l4_checksum(const struct devtx_ctx *_ctx,
> > + u16 csum_start, u16 csum_offset)
> > +{
> > + const struct mlx5e_devtx_ctx *ctx = (void *)_ctx;
> > + struct mlx5_wqe_eth_seg *eseg;
> > +
> > + if (unlikely(!ctx->wqe))
> > + return -ENODATA;
> > +
> > + eseg = &ctx->wqe->eth;
> > +
> > + switch (csum_offset) {
> > + case sizeof(struct ethhdr) + sizeof(struct iphdr) + offsetof(struct udphdr, check):
> > + case sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + offsetof(struct udphdr, check):
> > + /* Looks like HW/FW is doing parsing, so offsets are largely ignored. */
> > + eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
>
> I think this proves my point: csum is not generalizable even across veth and mlx5.
> Above is a square peg that tries to fit csum_start/offset api (that makes sense from SW pov)
> into HW that has different ideas about csum-ing.
>
> Here is what mlx5 does:
> mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> struct mlx5e_accel_tx_state *accel,
> struct mlx5_wqe_eth_seg *eseg)
> {
> if (unlikely(mlx5e_ipsec_txwqe_build_eseg_csum(sq, skb, eseg)))
> return;
>
> if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
> eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
> if (skb->encapsulation) {
> eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
> MLX5_ETH_WQE_L4_INNER_CSUM;
> sq->stats->csum_partial_inner++;
> } else {
> eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
> sq->stats->csum_partial++;
> }
>
> How would you generalize that into csum api that will work across NICs ?
>
> My answer stands: you cannot.
>
> My proposal again:
> add driver specifc kfuncs and hooks for things like csum.
I do see your point, but to also give you my perspective: I have no
clue what those _CSUM tx bits do (as a non-mlx employee). And what
kind of packets they support (initial patch doesn't give any info).
We can definitely expose mlx5 specific request_l4_checksum(bool encap)
which does things similar to mlx5e_txwqe_build_eseg_csum, but then,
what does it _actually_ do? It obviously can't checksum arbitrary
packet formats (because it has this inner/outer selection bit), so
there is really no way for me to provide a per-driver kfunc api. Maybe
the vendors can?
So having csum_start/csum_offset abstraction which works with fixed
offsets seems like at least it correctly sets the expectation for BPF
program writers.
The vendors are already supposed to conform to this start/offset API for skb.
But back to your point: should we maybe try to meet somewhere in the middle?
1. We try to provide "generic" offload kfuncs; for mlx5, we'll have
this mlx5e_devtx_request_l4_checksum which works for fixed offsets
2. We also let vendors do device-specific "extensions" where devices
deviate too much: bpf_request_RAW_mlx5e_l4_checksum(bool encap)
This can give BPF authors opportunity to write somewhat portable
programs and also use vendor specific apis when/if needed.
I think we had a similar idea for rx side: have generic kfuncs, but
also let vendors experiment with custom kfuncs if they want to
differentiate.
WDYT? Can it give us the best things from both sides?
> Kuba,
> since you nacked driver specific stuff please suggest a way to unblock this stalemate.
Powered by blists - more mailing lists