[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBtdnku7StcQ-SamadvAF==DRuLLZO94yOR1WJ9Bg=uX1w@mail.gmail.com>
Date: Wed, 13 Jul 2022 11:36:17 -0700
From: Stanislav Fomichev <sdf@...gle.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
Mykola Lysenko <mykolal@...com>,
Kumar Kartikeya Dwivedi <memxor@...il.com>,
netdev@...r.kernel.org, bpf@...r.kernel.org,
Freysteinn Alfredsson <freysteinn.alfredsson@....se>,
Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities
On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Packet forwarding is an important use case for XDP, which offers
> significant performance improvements compared to forwarding using the
> regular networking stack. However, XDP currently offers no mechanism to
> delay, queue or schedule packets, which limits the practical uses for
> XDP-based forwarding to those where the capacity of input and output links
> always match each other (i.e., no rate transitions or many-to-one
> forwarding). It also prevents an XDP-based router from doing any kind of
> traffic shaping or reordering to enforce policy.
>
> This series represents a first RFC of our attempt to remedy this lack. The
> code in these patches is functional, but needs additional testing and
> polishing before being considered for merging. I'm posting it here as an
> RFC to get some early feedback on the API and overall design of the
> feature.
>
> DESIGN
>
> The design consists of three components: A new map type for storing XDP
> frames, a new 'dequeue' program type that will run in the TX softirq to
> provide the stack with packets to transmit, and a set of helpers to dequeue
> packets from the map, optionally drop them, and to schedule an interface
> for transmission.
>
> The new map type is modelled on the PIFO data structure proposed in the
> literature[0][1]. It represents a priority queue where packets can be
> enqueued in any priority, but is always dequeued from the head. From the
> XDP side, the map is simply used as a target for the bpf_redirect_map()
> helper, where the target index is the desired priority.
I have the same question I asked on the series from Cong:
Any considerations for existing carousel/edt-like models?
Can we make the map flexible enough to implement different qdisc policies?
> The dequeue program type is a new BPF program type that is attached to an
> interface; when an interface is scheduled for transmission, the stack will
> execute the attached dequeue program and, if it returns a packet to
> transmit, that packet will be transmitted using the existing ndo_xdp_xmit()
> driver function.
>
> The dequeue program can obtain packets by pulling them out of a PIFO map
> using the new bpf_packet_dequeue() helper. This returns a pointer to an
> xdp_md structure, which can be dereferenced to obtain packet data and
> data_meta pointers like in an XDP program. The returned packets are also
> reference counted, meaning the verifier enforces that the dequeue program
> either drops the packet (with the bpf_packet_drop() helper), or returns it
> for transmission. Finally, a helper is added that can be used to actually
> schedule an interface for transmission using the dequeue program type; this
> helper can be called from both XDP and dequeue programs.
>
> PERFORMANCE
>
> Preliminary performance tests indicate about 50ns overhead of adding
> queueing to the xdp_fwd example (last patch), which translates to a 20% PPS
> overhead (but still 2x the forwarding performance of the netstack):
>
> xdp_fwd : 4.7 Mpps (213 ns /pkt)
> xdp_fwd -Q: 3.8 Mpps (263 ns /pkt)
> netstack: 2 Mpps (500 ns /pkt)
>
> RELATION TO BPF QDISC
>
> Cong Wang's BPF qdisc patches[2] share some aspects of this series, in
> particular the use of a map to store packets. This is no accident, as we've
> had ongoing discussions for a while now. I have no great hope that we can
> completely converge the two efforts into a single BPF-based queueing
> API (as has been discussed before[3], consolidating the SKB and XDP paths
> is challenging). Rather, I'm hoping that we can converge the designs enough
> that we can share BPF code between XDP and qdisc layers using common
> functions, like it's possible to do with XDP and TC-BPF today. This would
> imply agreeing on the map type and API, and possibly on the set of helpers
> available to the BPF programs.
What would be the big difference for the map wrt xdp_frame vs sk_buff
excluding all obvious stuff like locking/refcnt?
> PATCH STRUCTURE
>
> This series consists of a total of 17 patches, as follows:
>
> Patches 1-3 are smaller preparatory refactoring patches used by subsequent
> patches.
Seems like these can go separately without holding the rest?
> Patches 4-5 introduce the PIFO map type, and patch 6 introduces the dequeue
> program type.
[...]
> Patches 7-10 adds the dequeue helpers and the verifier features needed to
> recognise packet pointers, reference count them, and allow dereferencing
> them to obtain packet data pointers.
Have you considered using kfuncs for these instead of introducing new
hooks/contexts/etc?
> Patches 11 and 12 add the dequeue program hook to the TX path, and the
> helpers to schedule an interface.
>
> Patches 13-16 add libbpf support for the new types, and selftests for the
> new features.
>
> Finally, patch 17 adds queueing support to the xdp_fwd program in
> samples/bpf to provide an easy-to-use way of testing the feature; this is
> for illustrative purposes for the RFC only, and will not be included in the
> final submission.
>
> SUPPLEMENTARY MATERIAL
>
> A (WiP) test harness for implementing and unit-testing scheduling
> algorithms using this framework (and the bpf_prog_run() hook) is available
> as part of the bpf-examples repository[4]. We plan to expand this with more
> test algorithms to smoke-test the API, and also add ready-to-use queueing
> algorithms for use for forwarding (to replace the xdp_fwd patch included as
> part of this RFC submission).
>
> The work represented in this series was done in collaboration with several
> people. Thanks to Kumar Kartikeya Dwivedi for writing the verifier
> enhancements in this series, to Frey Alfredsson for his work on the testing
> harness in [4], and to Jesper Brouer, Per Hurtig and Anna Brunstrom for
> their valuable input on the design of the queueing APIs.
>
> This series is also available as a git tree on git.kernel.org[5].
>
> NOTES
>
> [0] http://web.mit.edu/pifo/
> [1] https://arxiv.org/abs/1810.03060
> [2] https://lore.kernel.org/r/20220602041028.95124-1-xiyou.wangcong@gmail.com
> [3] https://lore.kernel.org/r/b4ff6a2b-1478-89f8-ea9f-added498c59f@gmail.com
> [4] https://github.com/xdp-project/bpf-examples/pull/40
> [5] https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-queueing-06
>
> Kumar Kartikeya Dwivedi (5):
> bpf: Use 64-bit return value for bpf_prog_run
> bpf: Teach the verifier about referenced packets returned from dequeue
> programs
> bpf: Introduce pkt_uid member for PTR_TO_PACKET
> bpf: Implement direct packet access in dequeue progs
> selftests/bpf: Add verifier tests for dequeue prog
>
> Toke Høiland-Jørgensen (12):
> dev: Move received_rps counter next to RPS members in softnet data
> bpf: Expand map key argument of bpf_redirect_map to u64
> bpf: Add a PIFO priority queue map type
> pifomap: Add queue rotation for continuously increasing rank mode
> xdp: Add dequeue program type for getting packets from a PIFO
> bpf: Add helpers to dequeue from a PIFO map
> dev: Add XDP dequeue hook
> bpf: Add helper to schedule an interface for TX dequeue
> libbpf: Add support for dequeue program type and PIFO map type
> libbpf: Add support for querying dequeue programs
> selftests/bpf: Add test for XDP queueing through PIFO maps
> samples/bpf: Add queueing support to xdp_fwd sample
>
> include/linux/bpf-cgroup.h | 12 +-
> include/linux/bpf.h | 64 +-
> include/linux/bpf_types.h | 4 +
> include/linux/bpf_verifier.h | 14 +-
> include/linux/filter.h | 63 +-
> include/linux/netdevice.h | 8 +-
> include/net/xdp.h | 16 +-
> include/uapi/linux/bpf.h | 50 +-
> include/uapi/linux/if_link.h | 4 +-
> kernel/bpf/Makefile | 2 +-
> kernel/bpf/cgroup.c | 12 +-
> kernel/bpf/core.c | 14 +-
> kernel/bpf/cpumap.c | 4 +-
> kernel/bpf/devmap.c | 92 ++-
> kernel/bpf/offload.c | 4 +-
> kernel/bpf/pifomap.c | 635 ++++++++++++++++++
> kernel/bpf/syscall.c | 3 +
> kernel/bpf/verifier.c | 148 +++-
> net/bpf/test_run.c | 54 +-
> net/core/dev.c | 109 +++
> net/core/dev.h | 2 +
> net/core/filter.c | 307 ++++++++-
> net/core/rtnetlink.c | 30 +-
> net/packet/af_packet.c | 7 +-
> net/xdp/xskmap.c | 4 +-
> samples/bpf/xdp_fwd_kern.c | 65 +-
> samples/bpf/xdp_fwd_user.c | 200 ++++--
> tools/include/uapi/linux/bpf.h | 48 ++
> tools/include/uapi/linux/if_link.h | 4 +-
> tools/lib/bpf/libbpf.c | 1 +
> tools/lib/bpf/libbpf.h | 1 +
> tools/lib/bpf/libbpf_probes.c | 5 +
> tools/lib/bpf/netlink.c | 8 +
> .../selftests/bpf/prog_tests/pifo_map.c | 125 ++++
> .../bpf/prog_tests/xdp_pifo_test_run.c | 154 +++++
> tools/testing/selftests/bpf/progs/pifo_map.c | 54 ++
> .../selftests/bpf/progs/test_xdp_pifo.c | 110 +++
> tools/testing/selftests/bpf/test_verifier.c | 29 +-
> .../testing/selftests/bpf/verifier/dequeue.c | 160 +++++
> 39 files changed, 2426 insertions(+), 200 deletions(-)
> create mode 100644 kernel/bpf/pifomap.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/pifo_map.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_pifo_test_run.c
> create mode 100644 tools/testing/selftests/bpf/progs/pifo_map.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_pifo.c
> create mode 100644 tools/testing/selftests/bpf/verifier/dequeue.c
>
> --
> 2.37.0
>
Powered by blists - more mailing lists