netdev - Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <21f4ee22-84f0-4d5e-8630-9a889ca11e31@kernel.org>
Date: Thu, 31 Jul 2025 18:27:07 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Martin KaFai Lau <martin.lau@...ux.dev>, Jakub Kicinski
 <kuba@...nel.org>, Lorenzo Bianconi <lorenzo@...nel.org>
Cc: Stanislav Fomichev <stfomichev@...il.com>, bpf@...r.kernel.org,
 netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <borkmann@...earbox.net>,
 Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
 <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, sdf@...ichev.me,
 kernel-team@...udflare.com, arthur@...hurfabre.com, jakub@...udflare.com,
 Jesse Brandeburg <jbrandeburg@...udflare.com>,
 Andrew Rzeznik <arzeznik@...udflare.com>
Subject: Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for
 XDP_REDIRECTed packets

On 29/07/2025 21.47, Martin KaFai Lau wrote:
> On 7/29/25 4:15 AM, Jesper Dangaard Brouer wrote:
>> That idea has been considered before, but it unfortunately doesn't work
>> from a performance angle. The performance model of XDP_REDIRECT into
>> CPUMAP relies on moving the expensive SKB allocation+init to a remote
>> CPU. This keeps the ingress CPU free to process packets at near line
>> rate (our DDoS use-case). If we allocate the SKB on the ingress-CPU
>> before the redirect, we destroy this load-balancing model and create the
>> exact bottleneck we designed CPUMAP to avoid.
> 
> iirc, a xdp prog can be attached to a cpumap. The skb can be created by 
> that xdp prog running on the remote cpu. It should be like a xdp prog 
> returning a XDP_PASS + an optional skb. The xdp prog can set some fields 
> in the skb. Other than setting fields in the skb, something else may be 
> also possible in the future, e.g. look up sk, earlier demux ...etc.
> 

I have strong reservations about having the BPF program itself trigger
the SKB allocation. I believe this would fundamentally break the
performance model that makes cpumap redirect so effective.

The key to XDP's high performance lies in processing a bulk of
xdp_frames in a tight loop to amortize costs. The existing cpumap code
on the remote CPU is already highly optimized for this: it performs bulk
allocation of SKBs and uses careful prefetching to hide the memory
latency. Allowing a BPF program to sometimes trigger a heavyweight SKB
alloc+init (4 cache-line misses) would bypass all these existing
optimizations. It would introduce significant jitter into the pipeline
and disrupt the entire bulk-processing model we rely on for performance.

This performance is not just theoretical; we rely on it for DDoS
protection. For example, our plan is to use the XDP program on the
cpumap hook to run secondary DDoS mitigation rules that currently use
iptables (funny, many rules are actually BPF program snippets today).

Architecturally, there is a clean separation today: the BPF program
makes a decision, and the highly-optimized cpumap or core kernel code
acts on it (build_skb, napi_gro_receive, etc). Your proposal blurs this
line significantly. Our patch, in contrast, preserves this model. It
simply provides the necessary data (the hash, vlan and timestamp) to the
existing cpumap/veth skb path via the xdp_frame.

While more advanced capabilities are an interesting topic for the
future, my goal here is to solve the immediate, concrete problem of
transferring metadata cleanly, without disrupting the performance
architecture we rely on for use cases like DDoS mitigation.

--Jesper