[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250801133803.7570a6fd@kernel.org>
Date: Fri, 1 Aug 2025 13:38:03 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, Lorenzo Bianconi
<lorenzo@...nel.org>, Stanislav Fomichev <stfomichev@...il.com>,
bpf@...r.kernel.org, netdev@...r.kernel.org, Alexei Starovoitov
<ast@...nel.org>, Daniel Borkmann <borkmann@...earbox.net>, Eric Dumazet
<eric.dumazet@...il.com>, "David S. Miller" <davem@...emloft.net>, Paolo
Abeni <pabeni@...hat.com>, sdf@...ichev.me, kernel-team@...udflare.com,
arthur@...hurfabre.com, jakub@...udflare.com, Jesse Brandeburg
<jbrandeburg@...udflare.com>, Andrew Rzeznik <arzeznik@...udflare.com>
Subject: Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for
XDP_REDIRECTed packets
On Thu, 31 Jul 2025 18:27:07 +0200 Jesper Dangaard Brouer wrote:
> > iirc, a xdp prog can be attached to a cpumap. The skb can be created by
> > that xdp prog running on the remote cpu. It should be like a xdp prog
> > returning a XDP_PASS + an optional skb. The xdp prog can set some fields
> > in the skb. Other than setting fields in the skb, something else may be
> > also possible in the future, e.g. look up sk, earlier demux ...etc.
>
> I have strong reservations about having the BPF program itself trigger
> the SKB allocation. I believe this would fundamentally break the
> performance model that makes cpumap redirect so effective.
See, I have similar concerns about growing struct xdp_frame.
That's why the guiding principle for me would be to make sure that
the features we add, beyond "classic XDP" as needed by DDoS, are
entirely optional. And if we include the goal of moving skb allocation
out of the driver to the xdp_frame growth, the drivers will sooner or
later unconditionally populate the xdp_frame. Decreasing performance
of "classic XDP"?
> The key to XDP's high performance lies in processing a bulk of
> xdp_frames in a tight loop to amortize costs. The existing cpumap code
> on the remote CPU is already highly optimized for this: it performs bulk
> allocation of SKBs and uses careful prefetching to hide the memory
> latency. Allowing a BPF program to sometimes trigger a heavyweight SKB
> alloc+init (4 cache-line misses) would bypass all these existing
> optimizations. It would introduce significant jitter into the pipeline
> and disrupt the entire bulk-processing model we rely on for performance.
>
> This performance is not just theoretical;
Somewhat off-topic for the architecture, I think, but do you happen
to have any real life data for that? IIRC the "listification" was a
moderate success for the skb path.. Or am I misreading and you have
other benefits of a tight processing loop in mind?
Powered by blists - more mailing lists