[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250422-afabre-traits-010-rfc2-v2-0-92bcc6b146c9@arthurfabre.com>
Date: Tue, 22 Apr 2025 15:23:29 +0200
From: Arthur Fabre <arthur@...hurfabre.com>
To: netdev@...r.kernel.org, bpf@...r.kernel.org
Cc: jakub@...udflare.com, hawk@...nel.org, yan@...udflare.com,
jbrandeburg@...udflare.com, thoiland@...hat.com, lbiancon@...hat.com,
ast@...nel.org, kuba@...nel.org, edumazet@...gle.com,
Arthur Fabre <arthur@...hurfabre.com>
Subject: [PATCH RFC bpf-next v2 00/17] traits: Per packet metadata KV store
The only way to attach information to a sk_buff that travels
through the network stack is with the mark. This field can be
read in firewall rules, drive routing decisions, and be
accessed by BPF programs.
However, its small size creates competition for bits, restricting
its practical use.
We propose using part of the packet headroom to store metadata.
This would allow:
- Tracing packets through the network stack and across the kernel-user
space boundary, by assigning them a unique ID.
- Metadata-driven packet redirection, routing, and socket steering with
early classification in XDP.
- Extracting information from encapsulation headers and sharing it with
user space or vice versa.
- Exposing XDP RX Metadata, like the timestamp, to the rest of the
network stack.
We originally proposed extending XDP metadata - binary blob
storage also in the headroom - to expose it throughout the network
stack. However based on feedback at LPC 2024 [1]:
- sharing a binary blob amongst different applications is hard.
- exposing a binary blob to userspace is awkward.
we've shifted to a limited KV store in the headroom.
To differentiate this from the overloaded "metadata" term, it's
tentatively called "packet traits".
Traits are currently stored at the start of the headroom:
| xdp_frame | traits | headroom | XDP metadata | data / packet |
This makes adding encap headers to a packet easier: the traits don't
have to be moved out of the way first.
But to let us change this in the future, XDP metadata and traits
aren't allowed to be used together.
A get() / set() / delete() API is exposed to BPF to store and
retrieve traits.
Initial benchmarks in XDP are promising, with get() / set() comparable
to an indirect function call. See patch 7: "trait: Replace memmove calls
with inline move" for full results.
We imagine adding first class support for this in netfilter (setting
/ checking traits in rules) and routing (selecting routing tables
based on traits) in follow up work.
We also envisage a first class userspace API for storing and
retrieving traits in the future.
Like XDP metadata, this relies on there being sufficient headroom
available. Piggy backing on top of that work, traits are currently
only supported:
- On ingress.
- By NIC drivers that support XDP metadata.
- When an XDP program is attached.
This limits the applicability of traits. But future work
guaranteeing sufficient headroom through other means should allow
these restrictions to be lifted.
[1] https://lpc.events/event/18/contributions/1935/
---
Changes in v2:
- Support sizes 0 (for flags), 4, and 8. 16 will be supported in the
future with a batch API, to set two consecutive 8 byte KVs at once.
- Prevent traits and XDP metadata from being used at the same time.
This will let us move trait storage where XDP metadata is today if
we want to.
- Use SKB extensions to store the traits in skbs.
- Drop registration API.
- Link to v1: https://lore.kernel.org/r/20250305-afabre-traits-010-rfc2-v1-0-d0ecfb869797@cloudflare.com
---
Arthur Fabre (16):
trait: limited KV store for packet metadata
xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions
trait: XDP support
trait: XDP selftest
trait: XDP benchmark
trait: Replace memcpy calls with inline copies
trait: Replace memmove calls with inline move
skb: Extension header in packet headroom
trait: Store traits in sk_buff extension
bnxt: Propagate trait presence to skb
ice: Propagate trait presence to skb
veth: Propagate trait presence to skb
virtio_net: Propagate trait presence to skb
mlx5: Propagate trait presence to skb
xdp generic: Propagate trait presence to skb
trait: Allow socket filters to access traits
Jesper Dangaard Brouer (1):
mlx5: move xdp_buff scope one level up
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 5 -
drivers/net/ethernet/intel/ice/ice_txrx.c | 4 +
drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 6 +-
.../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 6 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 114 ++++----
drivers/net/veth.c | 4 +
drivers/net/virtio_net.c | 8 +-
include/linux/skbuff.h | 42 +++
include/net/trait.h | 302 +++++++++++++++++++++
include/net/xdp.h | 56 +++-
net/core/dev.c | 1 +
net/core/filter.c | 10 +-
net/core/skbuff.c | 231 ++++++++++++++--
net/core/xdp.c | 69 ++++-
net/xdp/xsk.c | 11 +-
tools/testing/selftests/bpf/Makefile | 2 +
tools/testing/selftests/bpf/bench.c | 8 +
.../selftests/bpf/benchs/bench_xdp_traits.c | 160 +++++++++++
.../testing/selftests/bpf/prog_tests/xdp_traits.c | 33 +++
.../testing/selftests/bpf/progs/bench_xdp_traits.c | 128 +++++++++
.../testing/selftests/bpf/progs/test_xdp_traits.c | 206 ++++++++++++++
24 files changed, 1319 insertions(+), 99 deletions(-)
---
base-commit: 5709be4c35ba760b001733939e20069de033a697
change-id: 20250305-afabre-traits-010-rfc2-a8e4de0c490b
Best regards,
--
Arthur Fabre <arthur@...hurfabre.com>
Powered by blists - more mailing lists