[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250305-afabre-traits-010-rfc2-v1-0-d0ecfb869797@cloudflare.com>
Date: Wed, 05 Mar 2025 15:31:57 +0100
From: arthur@...hurfabre.com
To: netdev@...r.kernel.org, bpf@...r.kernel.org
Cc: jakub@...udflare.com, hawk@...nel.org, yan@...udflare.com,
jbrandeburg@...udflare.com, thoiland@...hat.com, lbiancon@...hat.com,
Arthur Fabre <afabre@...udflare.com>
Subject: [PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store
Currently, the only way to attach information to a sk_buff that travels
through the network stack is by using the mark field. This 32-bit field
is highly versatile - it can be read in firewall rules, drive routing
decisions, and be accessed by BPF programs.
However, its limited capacity creates competition for bits, restricting
its practical use.
To remedy this, we propose using part of the packet headroom to store
metadata. This would allow:
- Tracing packets through the network stack and across the kernel-user
space boundary, by assigning them a unique ID.
- Metadata-driven packet redirection, routing, and socket steering with
early classification in XDP.
- Extracting information from encapsulation headers and sharing it with
user space or vice versa.
- Exposing XDP RX Metadata, like the timestamp, to the rest of the
network stack.
We originally proposed extending XDP metadata - binary blob
storage also in the headroom - to expose it throughout the network
stack. However based on feedback at LPC 2024 [1]:
- sharing a binary blob amongst different applications is hard.
- exposing a binary blob to userspace is awkward.
we've shifted to a limited KV store in the headroom.
To differentiate this from the overloaded "metadata" term, it's
tentatively called "packet traits".
A get() / set() / delete() API is exposed to BPF to store and
retrieve traits.
Initial benchmarks in XDP are promising, with get() / set() comparable
to an indirect function call. See patch 6: "trait: Replace memmove calls
with inline move" for full results.
We imagine adding first class support for this in netfilter (setting
/ checking traits in rules) and routing (selecting routing tables
based on traits) in follow up work.
We also envisage a first class userspace API for storing and
retrieving traits in the future.
To co-exist with the existing XDP metadata area, traits are stored at
the start of the headroom:
| xdp_frame | traits | headroom | XDP metadata | data / packet |
Traits and XDP metadata are not allowed to overlap.
Like XDP metadata, this relies on there being sufficient headroom
available. Piggy backing on top of that work, traits are currently
only supported:
- On ingress.
- By NIC drivers that support XDP metadata.
- When an XDP program is attached.
This limits the applicability of traits. But future work
guaranteeing sufficient headroom through other means should allow
these restrictions to be lifted.
There are still a number of open questions:
- What sizes of values should be allowed? See patch 1 "trait: limited KV
store for packet metadata".
- How should we handle skb clones? See patch 16 "trait: Support sk_buffs".
- How should trait keys be allocated? See patch 18 "trait: registration
API".
- How should traits work with GRO? Could an API let us specify policies
for how traits should be merged? See patch 18 "trait: registration
API".
[1] https://lpc.events/event/18/contributions/1935/
Cc: jakub@...udflare.com
Cc: hawk@...nel.org
Cc: yan@...udflare.com
Cc: jbrandeburg@...udflare.com
Cc: thoiland@...hat.com
Cc: lbiancon@...hat.com
To: netdev@...r.kernel.org
To: bpf@...r.kernel.org
Signed-off-by: Arthur Fabre <afabre@...udflare.com>
---
Arthur Fabre (19):
trait: limited KV store for packet metadata
trait: XDP support
trait: basic XDP selftest
trait: basic XDP benchmark
trait: Replace memcpy calls with inline copies
trait: Replace memmove calls with inline move
xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions
trait: Propagate presence of traits to sk_buff
bnxt: Propagate trait presence to skb
ice: Propagate trait presence to skb
veth: Propagate trait presence to skb
virtio_net: Propagate trait presence to skb
mlx5: Propagate trait presence to skb
xdp generic: Propagate trait presence to skb
trait: Support sk_buffs
trait: Allow socket filters to access traits
trait: registration API
trait: Sync linux/bpf.h to tools/ for trait registration
trait: register traits in benchmarks and tests
Jesper Dangaard Brouer (1):
mlx5: move xdp_buff scope one level up
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +
drivers/net/ethernet/intel/ice/ice_txrx.c | 4 +
drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 6 +-
.../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 6 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 114 ++++----
drivers/net/veth.c | 4 +
drivers/net/virtio_net.c | 8 +-
include/linux/bpf-netns.h | 12 +
include/linux/skbuff.h | 33 ++-
include/net/net_namespace.h | 6 +
include/net/netns/trait.h | 22 ++
include/net/trait.h | 288 +++++++++++++++++++++
include/net/xdp.h | 42 ++-
include/uapi/linux/bpf.h | 26 ++
kernel/bpf/net_namespace.c | 54 ++++
kernel/bpf/syscall.c | 26 ++
kernel/bpf/verifier.c | 39 ++-
net/core/dev.c | 1 +
net/core/filter.c | 43 ++-
net/core/skbuff.c | 25 +-
net/core/xdp.c | 50 ++++
tools/include/uapi/linux/bpf.h | 26 ++
tools/testing/selftests/bpf/Makefile | 2 +
tools/testing/selftests/bpf/bench.c | 11 +
tools/testing/selftests/bpf/bench.h | 1 +
.../selftests/bpf/benchs/bench_xdp_traits.c | 191 ++++++++++++++
.../testing/selftests/bpf/prog_tests/xdp_traits.c | 51 ++++
.../testing/selftests/bpf/progs/bench_xdp_traits.c | 131 ++++++++++
.../testing/selftests/bpf/progs/test_xdp_traits.c | 94 +++++++
31 files changed, 1259 insertions(+), 69 deletions(-)
---
base-commit: 42ba8a49d085e0c2ad50fb9a8ec954c9762b6e01
change-id: 20250305-afabre-traits-010-rfc2-a8e4de0c490b
Best regards,
--
Arthur Fabre <afabre@...udflare.com>
Powered by blists - more mailing lists