[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241220195619.2022866-1-amery.hung@gmail.com>
Date: Fri, 20 Dec 2024 11:55:26 -0800
From: Amery Hung <ameryhung@...il.com>
To: netdev@...r.kernel.org
Cc: bpf@...r.kernel.org,
daniel@...earbox.net,
andrii@...nel.org,
alexei.starovoitov@...il.com,
martin.lau@...nel.org,
sinquersw@...il.com,
toke@...hat.com,
jhs@...atatu.com,
jiri@...nulli.us,
stfomichev@...il.com,
ekarani.silvestre@....ufcg.edu.br,
yangpeihao@...u.edu.cn,
xiyou.wangcong@...il.com,
yepeilin.cs@...il.com,
ameryhung@...il.com,
amery.hung@...edance.com
Subject: [PATCH bpf-next v2 00/14] bpf qdisc
Hi all,
This patchset aims to support implementing qdisc using bpf struct_ops.
This version takes a step back and only implements the minimum support
for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree
directly and 2) classful qdisc are deferred to future patchsets.
* Overview *
This series supports implementing qdisc using bpf struct_ops. bpf qdisc
aims to be a flexible and easy-to-use infrastructure that allows users to
quickly experiment with different scheduling algorithms/policies. It only
requires users to implement core qdisc logic using bpf and implements the
mundane part for them. In addition, the ability to easily communicate
between qdisc and other components will also bring new opportunities for
new applications and optimizations.
* struct_ops changes *
To make struct_ops works better with bpf qdisc, two new changes are
introduced to bpf specifically for struct_ops programs. Frist, we
introduce "__ref" postfix for arguments in stub functions in patch 1-2.
It allows Qdisc_ops->enqueue to acquire an unique referenced kptr to the
skb argument. Through the reference object tracking mechanism in
the verifier, we can make sure that the acquired skb will be either
enqueued or dropped. Besides, no duplicate references can be acquired.
Then, we allow a referenced kptr to be returned from struct_ops programs
so that we can return an skb naturally. This is done and tested in patch 3
and 4.
* Performance of bpf qdisc *
We tested several bpf qdiscs included in the selftests and their in-tree
counterparts to give you a sense of the performance of qdisc implemented
in bpf.
The implementation of bpf_fq is fairly complex and slightly different from
fq so later we only compare the two fifo qdiscs. bpf_fq implements the
same fair queueing algorithm in fq, but without flow hash collision
avoidance and garbage collection of inactive flows. bpf_fifo uses a single
bpf_list as a queue instead of three queues for different priorities in
pfifo_fast. The time complexity of fifo however should be similar since the
queue selection time is negligible.
Test setup:
client -> qdisc -------------> server
~~~~~~~~~~~~~~~ ~~~~~~
nested VM1 @ DC1 VM2 @ DC2
Throghput: iperf3 -t 600, 5 times
Qdisc Average (GBits/sec)
---------- -------------------
pfifo_fast 12.52 ± 0.26
bpf_fifo 11.72 ± 0.32
fq 10.24 ± 0.13
bpf_fq 11.92 ± 0.64
Latency: sockperf pp --tcp -t 600, 5 times
Qdisc Average (usec)
---------- --------------
pfifo_fast 244.58 ± 7.93
bpf_fifo 244.92 ± 15.22
fq 234.30 ± 19.25
bpf_fq 221.34 ± 10.76
Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf
implementatioin is consistent with previous observation (v8 throughput
test on a loopback device). This should be able to be mitigated by
supporting adding skb to bpf_list or bpf_rbtree directly in the future.
* Clean up skb in bpf qdisc during reset *
The current implementation relies on bpf qdisc implementors to correctly
release skbs in queues (bpf graphs or maps) in .reset, which might not be
a safe thing to do. The solution as Martin has suggested would be
supporting private data in struct_ops. This can also help simplifying
implementation of qdisc that works with mq. For examples, qdiscs in the
selftest mostly use global data. Therefore, even if user add multiple
qdisc instances under mq, they would still share the same queue.
---
v2: Rebase to bpf-next/master
Patch 1-4
Remove the use of ctx_arg_info->ref_obj_id when acquiring referenced kptr from struct_ops arg
Improve type comparison when checking kptr return from struct_ops
Simplify selftests with test_loader and nomerge attribute
Patch 5
Remove redundant checks in qdisc_init
Disallow tail_call
Patch 6
Improve kfunc ops availabilty filter by
i) Checking struct_ops->type
ii) Defining op-specific kfunc set
Patch 7
Search and add bpf_kfunc_desc after gen_prologue/epilogue
Patch 8
Use gen_prologue/epilogue to init/cancel watchdog timer
Patch 12
Mark read-only func arg and struct member const in libbpf
v1:
Fix struct_ops referenced kptr acquire/return mechanisms
Allow creating dynptr from skb
Add bpf qdisc kfunc filter
Support updating bstats and qstats
Update qdiscs in selftest to update stats
Add gc, handle hash collision and fix bugs in fq_bpf
Link: https://lore.kernel.org/bpf/20241213232958.2388301-1-amery.hung@bytedance.com/
past RFCs
v9: Drop classful qdisc operations and kfuncs
Drop support of enqueuing skb directly to bpf_rbtree/list
Link: https://lore.kernel.org/bpf/20240714175130.4051012-1-amery.hung@bytedance.com/
v8: Implement support of bpf qdisc using struct_ops
Allow struct_ops to acquire referenced kptr via argument
Allow struct_ops to release and return referenced kptr
Support enqueuing sk_buff to bpf_rbtree/list
Move examples from samples to selftests
Add a classful qdisc selftest
Link: https://lore.kernel.org/netdev/20240510192412.3297104-15-amery.hung@bytedance.com/
v7: Reference skb using kptr to sk_buff instead of __sk_buff
Use the new bpf rbtree/link to for skb queues
Add reset and init programs
Add a bpf fq qdisc sample
Add a bpf netem qdisc sample
Link: https://lore.kernel.org/netdev/cover.1705432850.git.amery.hung@bytedance.com/
v6: switch to kptr based approach
v5: mv kernel/bpf/skb_map.c net/core/skb_map.c
implement flow map as map-in-map
rename bpf_skb_tc_classify() and move it to net/sched/cls_api.c
clean up eBPF qdisc program context
v4: get rid of PIFO, use rbtree directly
v3: move priority queue from sch_bpf to skb map
introduce skb map and its helpers
introduce bpf_skb_classify()
use netdevice notifier to reset skb's
Rebase on latest bpf-next
v2: Rebase on latest net-next
Make the code more complete (but still incomplete)
Amery Hung (14):
bpf: Support getting referenced kptr from struct_ops argument
selftests/bpf: Test referenced kptr arguments of struct_ops programs
bpf: Allow struct_ops prog to return referenced kptr
selftests/bpf: Test returning referenced kptr from struct_ops programs
bpf: net_sched: Support implementation of Qdisc_ops in bpf
bpf: net_sched: Add basic bpf qdisc kfuncs
bpf: Search and add kfuncs in struct_ops prologue and epilogue
bpf: net_sched: Add a qdisc watchdog timer
bpf: net_sched: Support updating bstats
bpf: net_sched: Support updating qstats
bpf: net_sched: Allow writing to more Qdisc members
libbpf: Support creating and destroying qdisc
selftests: Add a basic fifo qdisc test
selftests: Add a bpf fq qdisc to selftest
include/linux/bpf.h | 3 +
include/linux/btf.h | 1 +
include/linux/filter.h | 10 +
kernel/bpf/bpf_struct_ops.c | 40 +-
kernel/bpf/btf.c | 7 +-
kernel/bpf/verifier.c | 98 ++-
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/bpf_qdisc.c | 443 +++++++++++
net/sched/sch_api.c | 7 +-
net/sched/sch_generic.c | 3 +-
tools/lib/bpf/libbpf.h | 5 +-
tools/lib/bpf/netlink.c | 20 +-
tools/testing/selftests/bpf/config | 1 +
.../selftests/bpf/prog_tests/bpf_qdisc.c | 185 +++++
.../prog_tests/test_struct_ops_kptr_return.c | 16 +
.../prog_tests/test_struct_ops_refcounted.c | 12 +
.../selftests/bpf/progs/bpf_qdisc_common.h | 27 +
.../selftests/bpf/progs/bpf_qdisc_fifo.c | 117 +++
.../selftests/bpf/progs/bpf_qdisc_fq.c | 726 ++++++++++++++++++
.../bpf/progs/struct_ops_kptr_return.c | 30 +
...uct_ops_kptr_return_fail__invalid_scalar.c | 26 +
.../struct_ops_kptr_return_fail__local_kptr.c | 34 +
...uct_ops_kptr_return_fail__nonzero_offset.c | 25 +
.../struct_ops_kptr_return_fail__wrong_type.c | 30 +
.../bpf/progs/struct_ops_refcounted.c | 31 +
...ruct_ops_refcounted_fail__global_subprog.c | 37 +
.../struct_ops_refcounted_fail__ref_leak.c | 22 +
.../selftests/bpf/test_kmods/bpf_testmod.c | 15 +
.../selftests/bpf/test_kmods/bpf_testmod.h | 6 +
30 files changed, 1964 insertions(+), 26 deletions(-)
create mode 100644 net/sched/bpf_qdisc.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_common.h
create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c
--
2.47.0
Powered by blists - more mailing lists