[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <175146824674.1421237.18351246421763677468.stgit@firesoul>
Date: Wed, 02 Jul 2025 16:58:12 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: bpf@...r.kernel.org, netdev@...r.kernel.org,
Jakub Kicinski <kuba@...nel.org>, lorenzo@...nel.org
Cc: Jesper Dangaard Brouer <hawk@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <borkmann@...earbox.net>,
Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
sdf@...ichev.me, kernel-team@...udflare.com, arthur@...hurfabre.com,
jakub@...udflare.com
Subject: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for
XDP_REDIRECTed packets
This patch series introduces a mechanism for an XDP program to store RX
metadata hints - specifically rx_hash, rx_vlan_tag, and rx_timestamp -
into the xdp_frame. These stored hints are then used to populate the
corresponding fields in the SKB that is created from the xdp_frame
following an XDP_REDIRECT.
The chosen RX metadata hints intentionally map to the existing NIC
hardware metadata that can be read via kfuncs [1]. While this design
allows a BPF program to read and propagate existing hardware hints, our
primary motivation is to enable setting custom values. This is important
for use cases where the hardware-provided information is insufficient or
needs to be calculated based on packet contents unavailable to the
hardware.
The primary motivation for this feature is to enable scalable load
balancing of encapsulated tunnel traffic at the XDP layer. When tunnelled
packets (e.g., IPsec, GRE) are redirected via cpumap or to a veth device,
the networking stack later calculates a software hash based on the outer
headers. For a single tunnel, these outer headers are often identical,
causing all packets to be assigned the same hash. This collapses all
traffic onto a single RX queue, creating a performance bottleneck and
defeating receive-side scaling (RSS).
Our immediate use case involves load balancing IPsec traffic. For such
tunnelled traffic, any hardware-provided RX hash is calculated on the
outer headers and is therefore incorrect for distributing inner flows.
There is no reason to read the existing value, as it must be recalculated.
In our XDP program, we perform a partial decryption to access the inner
headers and calculate a new load-balancing hash, which provides better
flow distribution. However, without this patch set, there is no way to
persist this new hash for the network stack to use post-redirect.
This series solves the problem by introducing new BPF kfuncs that allow an
XDP program to write e.g. the hash value into the xdp_frame. The
__xdp_build_skb_from_frame() function is modified to use this stored value
to set skb->hash on the newly created SKB. As a result, the veth driver's
queue selection logic uses the BPF-supplied hash, achieving proper
traffic distribution across multiple CPU cores. This also ensures that
consumers, like the GRO engine, can operate effectively.
We considered XDP traits as an alternative to adding static members to
struct xdp_frame. Given the immediate need for this functionality and the
current development status of traits, we believe this approach is a
pragmatic solution. We are open to migrating to a traits-based
implementation if and when they become a generally accepted mechanism for
such extensions.
[1] https://docs.kernel.org/networking/xdp-rx-metadata.html
---
V1: https://lore.kernel.org/all/174897271826.1677018.9096866882347745168.stgit@firesoul/
Jesper Dangaard Brouer (2):
selftests/bpf: Adjust test for maximum packet size in xdp_do_redirect
net: xdp: update documentation for xdp-rx-metadata.rst
Lorenzo Bianconi (5):
net: xdp: Add xdp_rx_meta structure
net: xdp: Add kfuncs to store hw metadata in xdp_buff
net: xdp: Set skb hw metadata from xdp_frame
net: veth: Read xdp metadata from rx_meta struct if available
bpf: selftests: Add rx_meta store kfuncs selftest
Documentation/networking/xdp-rx-metadata.rst | 77 ++++++--
drivers/net/veth.c | 12 ++
include/net/xdp.h | 134 ++++++++++++--
net/core/xdp.c | 107 ++++++++++-
net/xdp/xsk_buff_pool.c | 4 +-
.../bpf/prog_tests/xdp_do_redirect.c | 6 +-
.../selftests/bpf/prog_tests/xdp_rxmeta.c | 166 ++++++++++++++++++
.../selftests/bpf/progs/xdp_rxmeta_receiver.c | 44 +++++
.../selftests/bpf/progs/xdp_rxmeta_redirect.c | 43 +++++
9 files changed, 558 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_rxmeta.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_rxmeta_receiver.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_rxmeta_redirect.c
--
Powered by blists - more mailing lists