[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <874j2h86p6.fsf@toke.dk>
Date: Thu, 02 Jan 2025 18:29:57 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Amery Hung <ameryhung@...il.com>, netdev@...r.kernel.org
Cc: bpf@...r.kernel.org, daniel@...earbox.net, andrii@...nel.org,
alexei.starovoitov@...il.com, martin.lau@...nel.org, sinquersw@...il.com,
jhs@...atatu.com, jiri@...nulli.us, stfomichev@...il.com,
ekarani.silvestre@....ufcg.edu.br, yangpeihao@...u.edu.cn,
xiyou.wangcong@...il.com, yepeilin.cs@...il.com, ameryhung@...il.com,
amery.hung@...edance.com
Subject: Re: [PATCH bpf-next v2 00/14] bpf qdisc
Amery Hung <ameryhung@...il.com> writes:
> Hi all,
>
> This patchset aims to support implementing qdisc using bpf struct_ops.
> This version takes a step back and only implements the minimum support
> for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree
> directly and 2) classful qdisc are deferred to future patchsets.
>
> * Overview *
>
> This series supports implementing qdisc using bpf struct_ops. bpf qdisc
> aims to be a flexible and easy-to-use infrastructure that allows users to
> quickly experiment with different scheduling algorithms/policies. It only
> requires users to implement core qdisc logic using bpf and implements the
> mundane part for them. In addition, the ability to easily communicate
> between qdisc and other components will also bring new opportunities for
> new applications and optimizations.
This is very cool, and I'm thrilled to see this work getting closer to
being merged! :)
> * struct_ops changes *
>
> To make struct_ops works better with bpf qdisc, two new changes are
> introduced to bpf specifically for struct_ops programs. Frist, we
> introduce "__ref" postfix for arguments in stub functions in patch 1-2.
> It allows Qdisc_ops->enqueue to acquire an unique referenced kptr to the
> skb argument. Through the reference object tracking mechanism in
> the verifier, we can make sure that the acquired skb will be either
> enqueued or dropped. Besides, no duplicate references can be acquired.
> Then, we allow a referenced kptr to be returned from struct_ops programs
> so that we can return an skb naturally. This is done and tested in patch 3
> and 4.
>
> * Performance of bpf qdisc *
>
> We tested several bpf qdiscs included in the selftests and their in-tree
> counterparts to give you a sense of the performance of qdisc implemented
> in bpf.
>
> The implementation of bpf_fq is fairly complex and slightly different from
> fq so later we only compare the two fifo qdiscs. bpf_fq implements the
> same fair queueing algorithm in fq, but without flow hash collision
> avoidance and garbage collection of inactive flows. bpf_fifo uses a single
> bpf_list as a queue instead of three queues for different priorities in
> pfifo_fast. The time complexity of fifo however should be similar since the
> queue selection time is negligible.
>
> Test setup:
>
> client -> qdisc -------------> server
> ~~~~~~~~~~~~~~~ ~~~~~~
> nested VM1 @ DC1 VM2 @ DC2
>
> Throghput: iperf3 -t 600, 5 times
>
> Qdisc Average (GBits/sec)
> ---------- -------------------
> pfifo_fast 12.52 ± 0.26
> bpf_fifo 11.72 ± 0.32
> fq 10.24 ± 0.13
> bpf_fq 11.92 ± 0.64
>
> Latency: sockperf pp --tcp -t 600, 5 times
>
> Qdisc Average (usec)
> ---------- --------------
> pfifo_fast 244.58 ± 7.93
> bpf_fifo 244.92 ± 15.22
> fq 234.30 ± 19.25
> bpf_fq 221.34 ± 10.76
>
> Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf
> implementatioin is consistent with previous observation (v8 throughput
> test on a loopback device). This should be able to be mitigated by
> supporting adding skb to bpf_list or bpf_rbtree directly in the future.
This looks pretty decent!
> * Clean up skb in bpf qdisc during reset *
>
> The current implementation relies on bpf qdisc implementors to correctly
> release skbs in queues (bpf graphs or maps) in .reset, which might not be
> a safe thing to do. The solution as Martin has suggested would be
> supporting private data in struct_ops. This can also help simplifying
> implementation of qdisc that works with mq. For examples, qdiscs in the
> selftest mostly use global data. Therefore, even if user add multiple
> qdisc instances under mq, they would still share the same queue.
So is the plan to fix this before merging this series? Seems like a bit
of a footgun, otherwise?
-Toke
Powered by blists - more mailing lists