lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=Pz_EWHsWzVZkZfojoRyUgLPVhGRHq6aGVhdcLC2YvHw@mail.gmail.com>
Date:   Thu, 14 Jul 2022 10:05:09 -0400
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Mykola Lysenko <mykolal@...com>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>,
        Freysteinn Alfredsson <freysteinn.alfredsson@....se>,
        Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities

I think what would be really interesting is to see the performance numbers when
you have multiple producers/consumers(translation multiple
threads/softirqs) in play
targeting the same queues. Does PIFO alleviate the synchronization challenge
when you have multiple concurrent readers/writers? Or maybe for your use case
this would not be a common occurrence or not something you care about?

As I mentioned previously, I think this is what Cong's approach gets for free.

cheers,
jamal


cheers,
jamal

On Wed, Jul 13, 2022 at 7:14 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Packet forwarding is an important use case for XDP, which offers
> significant performance improvements compared to forwarding using the
> regular networking stack. However, XDP currently offers no mechanism to
> delay, queue or schedule packets, which limits the practical uses for
> XDP-based forwarding to those where the capacity of input and output links
> always match each other (i.e., no rate transitions or many-to-one
> forwarding). It also prevents an XDP-based router from doing any kind of
> traffic shaping or reordering to enforce policy.
>
> This series represents a first RFC of our attempt to remedy this lack. The
> code in these patches is functional, but needs additional testing and
> polishing before being considered for merging. I'm posting it here as an
> RFC to get some early feedback on the API and overall design of the
> feature.
>
> DESIGN
>
> The design consists of three components: A new map type for storing XDP
> frames, a new 'dequeue' program type that will run in the TX softirq to
> provide the stack with packets to transmit, and a set of helpers to dequeue
> packets from the map, optionally drop them, and to schedule an interface
> for transmission.
>
> The new map type is modelled on the PIFO data structure proposed in the
> literature[0][1]. It represents a priority queue where packets can be
> enqueued in any priority, but is always dequeued from the head. From the
> XDP side, the map is simply used as a target for the bpf_redirect_map()
> helper, where the target index is the desired priority.
>
> The dequeue program type is a new BPF program type that is attached to an
> interface; when an interface is scheduled for transmission, the stack will
> execute the attached dequeue program and, if it returns a packet to
> transmit, that packet will be transmitted using the existing ndo_xdp_xmit()
> driver function.
>
> The dequeue program can obtain packets by pulling them out of a PIFO map
> using the new bpf_packet_dequeue() helper. This returns a pointer to an
> xdp_md structure, which can be dereferenced to obtain packet data and
> data_meta pointers like in an XDP program. The returned packets are also
> reference counted, meaning the verifier enforces that the dequeue program
> either drops the packet (with the bpf_packet_drop() helper), or returns it
> for transmission. Finally, a helper is added that can be used to actually
> schedule an interface for transmission using the dequeue program type; this
> helper can be called from both XDP and dequeue programs.
>
> PERFORMANCE
>
> Preliminary performance tests indicate about 50ns overhead of adding
> queueing to the xdp_fwd example (last patch), which translates to a 20% PPS
> overhead (but still 2x the forwarding performance of the netstack):
>
> xdp_fwd :     4.7 Mpps  (213 ns /pkt)
> xdp_fwd -Q:   3.8 Mpps  (263 ns /pkt)
> netstack:       2 Mpps  (500 ns /pkt)
>
> RELATION TO BPF QDISC
>
> Cong Wang's BPF qdisc patches[2] share some aspects of this series, in
> particular the use of a map to store packets. This is no accident, as we've
> had ongoing discussions for a while now. I have no great hope that we can
> completely converge the two efforts into a single BPF-based queueing
> API (as has been discussed before[3], consolidating the SKB and XDP paths
> is challenging). Rather, I'm hoping that we can converge the designs enough
> that we can share BPF code between XDP and qdisc layers using common
> functions, like it's possible to do with XDP and TC-BPF today. This would
> imply agreeing on the map type and API, and possibly on the set of helpers
> available to the BPF programs.
>
> PATCH STRUCTURE
>
> This series consists of a total of 17 patches, as follows:
>
> Patches 1-3 are smaller preparatory refactoring patches used by subsequent
> patches.
>
> Patches 4-5 introduce the PIFO map type, and patch 6 introduces the dequeue
> program type.
>
> Patches 7-10 adds the dequeue helpers and the verifier features needed to
> recognise packet pointers, reference count them, and allow dereferencing
> them to obtain packet data pointers.
>
> Patches 11 and 12 add the dequeue program hook to the TX path, and the
> helpers to schedule an interface.
>
> Patches 13-16 add libbpf support for the new types, and selftests for the
> new features.
>
> Finally, patch 17 adds queueing support to the xdp_fwd program in
> samples/bpf to provide an easy-to-use way of testing the feature; this is
> for illustrative purposes for the RFC only, and will not be included in the
> final submission.
>
> SUPPLEMENTARY MATERIAL
>
> A (WiP) test harness for implementing and unit-testing scheduling
> algorithms using this framework (and the bpf_prog_run() hook) is available
> as part of the bpf-examples repository[4]. We plan to expand this with more
> test algorithms to smoke-test the API, and also add ready-to-use queueing
> algorithms for use for forwarding (to replace the xdp_fwd patch included as
> part of this RFC submission).
>
> The work represented in this series was done in collaboration with several
> people. Thanks to Kumar Kartikeya Dwivedi for writing the verifier
> enhancements in this series, to Frey Alfredsson for his work on the testing
> harness in [4], and to Jesper Brouer, Per Hurtig and Anna Brunstrom for
> their valuable input on the design of the queueing APIs.
>
> This series is also available as a git tree on git.kernel.org[5].
>
> NOTES
>
> [0] http://web.mit.edu/pifo/
> [1] https://arxiv.org/abs/1810.03060
> [2] https://lore.kernel.org/r/20220602041028.95124-1-xiyou.wangcong@gmail.com
> [3] https://lore.kernel.org/r/b4ff6a2b-1478-89f8-ea9f-added498c59f@gmail.com
> [4] https://github.com/xdp-project/bpf-examples/pull/40
> [5] https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-queueing-06
>
> Kumar Kartikeya Dwivedi (5):
>   bpf: Use 64-bit return value for bpf_prog_run
>   bpf: Teach the verifier about referenced packets returned from dequeue
>     programs
>   bpf: Introduce pkt_uid member for PTR_TO_PACKET
>   bpf: Implement direct packet access in dequeue progs
>   selftests/bpf: Add verifier tests for dequeue prog
>
> Toke Høiland-Jørgensen (12):
>   dev: Move received_rps counter next to RPS members in softnet data
>   bpf: Expand map key argument of bpf_redirect_map to u64
>   bpf: Add a PIFO priority queue map type
>   pifomap: Add queue rotation for continuously increasing rank mode
>   xdp: Add dequeue program type for getting packets from a PIFO
>   bpf: Add helpers to dequeue from a PIFO map
>   dev: Add XDP dequeue hook
>   bpf: Add helper to schedule an interface for TX dequeue
>   libbpf: Add support for dequeue program type and PIFO map type
>   libbpf: Add support for querying dequeue programs
>   selftests/bpf: Add test for XDP queueing through PIFO maps
>   samples/bpf: Add queueing support to xdp_fwd sample
>
>  include/linux/bpf-cgroup.h                    |  12 +-
>  include/linux/bpf.h                           |  64 +-
>  include/linux/bpf_types.h                     |   4 +
>  include/linux/bpf_verifier.h                  |  14 +-
>  include/linux/filter.h                        |  63 +-
>  include/linux/netdevice.h                     |   8 +-
>  include/net/xdp.h                             |  16 +-
>  include/uapi/linux/bpf.h                      |  50 +-
>  include/uapi/linux/if_link.h                  |   4 +-
>  kernel/bpf/Makefile                           |   2 +-
>  kernel/bpf/cgroup.c                           |  12 +-
>  kernel/bpf/core.c                             |  14 +-
>  kernel/bpf/cpumap.c                           |   4 +-
>  kernel/bpf/devmap.c                           |  92 ++-
>  kernel/bpf/offload.c                          |   4 +-
>  kernel/bpf/pifomap.c                          | 635 ++++++++++++++++++
>  kernel/bpf/syscall.c                          |   3 +
>  kernel/bpf/verifier.c                         | 148 +++-
>  net/bpf/test_run.c                            |  54 +-
>  net/core/dev.c                                | 109 +++
>  net/core/dev.h                                |   2 +
>  net/core/filter.c                             | 307 ++++++++-
>  net/core/rtnetlink.c                          |  30 +-
>  net/packet/af_packet.c                        |   7 +-
>  net/xdp/xskmap.c                              |   4 +-
>  samples/bpf/xdp_fwd_kern.c                    |  65 +-
>  samples/bpf/xdp_fwd_user.c                    | 200 ++++--
>  tools/include/uapi/linux/bpf.h                |  48 ++
>  tools/include/uapi/linux/if_link.h            |   4 +-
>  tools/lib/bpf/libbpf.c                        |   1 +
>  tools/lib/bpf/libbpf.h                        |   1 +
>  tools/lib/bpf/libbpf_probes.c                 |   5 +
>  tools/lib/bpf/netlink.c                       |   8 +
>  .../selftests/bpf/prog_tests/pifo_map.c       | 125 ++++
>  .../bpf/prog_tests/xdp_pifo_test_run.c        | 154 +++++
>  tools/testing/selftests/bpf/progs/pifo_map.c  |  54 ++
>  .../selftests/bpf/progs/test_xdp_pifo.c       | 110 +++
>  tools/testing/selftests/bpf/test_verifier.c   |  29 +-
>  .../testing/selftests/bpf/verifier/dequeue.c  | 160 +++++
>  39 files changed, 2426 insertions(+), 200 deletions(-)
>  create mode 100644 kernel/bpf/pifomap.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/pifo_map.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_pifo_test_run.c
>  create mode 100644 tools/testing/selftests/bpf/progs/pifo_map.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_pifo.c
>  create mode 100644 tools/testing/selftests/bpf/verifier/dequeue.c
>
> --
> 2.37.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ