lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 19 Sep 2019 17:05:14 -0700
From:   Matt Cover <werekraken@...il.com>
To:     davem@...emloft.net, ast@...nel.org, daniel@...earbox.net,
        kafai@...com, songliubraving@...com, yhs@...com,
        nikolay@...ulusnetworks.com, sd@...asysnail.net,
        sbrivio@...hat.com, vincent@...nat.ch, kda@...ux-powerpc.org,
        Matthew Cover <matthew.cover@...ckpath.com>, jiri@...lanox.com,
        Eric Dumazet <edumazet@...gle.com>, pabeni@...hat.com,
        idosch@...lanox.com, petrm@...lanox.com, f.fainelli@...il.com,
        stephen@...workplumber.org, dsahern@...il.com,
        christian@...uner.io, jakub.kicinski@...ronome.com,
        Roopa Prabhu <roopa@...ulusnetworks.com>,
        johannes.berg@...el.com, mkubecek@...e.cz, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
        Jason Wang <jasowang@...hat.com>
Subject: Re: [RFC {net,iproute2}-next 0/2] Introduce an eBPF hookpoint for tx
 queue selection in the XPS (Transmit Packet Steering) code.

On Thu, Sep 19, 2019 at 3:45 PM Matthew Cover <werekraken@...il.com> wrote:
>
> WORK IN PROGRESS:
>   * bpf program loading works!
>   * txq steering via bpf program return code works!
>   * bpf program unloading not working.
>   * bpf program attached query not working.
>
> This patch set provides a bpf hookpoint with goals similar to, but a more
> generic implementation than, TUNSETSTEERINGEBPF; userspace supplied tx queue
> selection policy.
>
> TUNSETSTEERINGEBPF is a useful bpf hookpoint, but has some drawbacks.
>
> First, it only works on tun/tap devices.
>
> Second, there is no way in the current TUNSETSTEERINGEBPF implementation
> to bail out or load a noop bpf prog and fallback to the no prog tx queue
> selection method.
>
> Third, the TUNSETSTEERINGEBPF interface seems to require possession of existing
> or creation of new queues/fds.
>
> This most naturally fits in the "wire" implementation since possession of fds
> is ensured. However, it also means the various "wire" implementations (e.g.
> qemu) have to all be made aware of TUNSETSTEERINGEBPF and expose an interface
> to load/unload a bpf prog (or provide a mechanism to pass an fd to another
> program).
>
> Alternatively, you can spin up an extra queue and immediately disable via
> IFF_DETACH_QUEUE, but this seems unsafe; packets could be enqueued to this
> extra file descriptor which is part of our bpf prog loader, not our "wire".
>
> Placing this in the XPS code and leveraging iproute2 and rtnetlink to provide
> our bpf prog loader in a similar manner to xdp gives us a nice way to separate
> the tap "wire" and the loading of tx queue selection policy. It also lets us
> use this hookpoint for any device traversing XPS.
>
> This patch only introduces the new hookpoint to the XPS code and will not yet
> be used by tun/tap devices using the intree tun.ko (which implements an
> .ndo_select_queue and does not traverse the XPS code).
>
> In a future patch set, we can optionally refactor tun.ko to traverse this call
> to bpf_prog_run_clear_cb() and bpf prog storage. tun/tap devices could then
> leverage iproute2 as a generic loader. The TUNSETSTEERINGEBPF interface could
> at this point be optionally deprecated/removed.
>
> Both patches in this set have been tested using a rebuilt tun.ko with no
> .ndo_select_queue.
>
>   sed -i '/\.ndo_select_queue.*=/d' drivers/net/tun.c
>
> The tap device was instantiated using tap_mq_pong.c, supporting scripts, and
> wrapping service found here:
>
>   https://github.com/stackpath/rxtxcpu/tree/v1.2.6/helpers
>
> The bpf prog source and test scripts can be found here:
>
>   https://github.com/werekraken/xps_ebpf
>
> In nstxq, netsniff-ng using PACKET_FANOUT_QM is leveraged to check the
> queue_mapping.
>
> With no prog loaded, the tx queue selection is adhering our xps_cpus
> configuration.
>
>   [vagrant@...alhost ~]$ grep . /sys/class/net/tap0/queues/tx-*/xps_cpus; ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe;
>   /sys/class/net/tap0/queues/tx-0/xps_cpus:1
>   /sys/class/net/tap0/queues/tx-1/xps_cpus:2
>   cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.146 ms
>   cpu0: qm0:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>   cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.121 ms
>   cpu1: qm1:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>
> With a return 0 bpg prog, our tx queue is 0 (despite xps_cpus).
>
>   [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello0.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
>   cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.160 ms
>   cpu0: qm0:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>   cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.124 ms
>   cpu1: qm0:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>               ping-4852  [000] ....  2691.633260: 0: xps (RET 0): Hello, World!
>               ping-4869  [001] ....  2695.753588: 0: xps (RET 0): Hello, World!
>
> With a return 1 bpg prog, our tx queue is 1.
>
>   [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello1.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
>   cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.193 ms
>   cpu0: qm1:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>   cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.135 ms
>   cpu1: qm1:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>               ping-4894  [000] ....  2710.652080: 0: xps (RET 1): Hello, World!
>               ping-4911  [001] ....  2714.774608: 0: xps (RET 1): Hello, World!
>
> With a return 2 bpg prog, our tx queue is 0 (we only have 2 tx queues).
>
>   [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello2.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
>   cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=1.20 ms
>   cpu0: qm0:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>   cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.986 ms
>   cpu1: qm0:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>               ping-4936  [000] ....  2729.442668: 0: xps (RET 2): Hello, World!
>               ping-4953  [001] ....  2733.614558: 0: xps (RET 2): Hello, World!
>
> With a return -1 bpf prog, our tx queue selection is once again determined by
> xps_cpus. Any negative return should work the same and provides a nice
> mechanism to bail out or have a noop bpf prog at this hookpoint.
>
>   [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello_neg1.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
>   cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.628 ms
>   cpu0: qm0:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>   cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.322 ms
>   cpu1: qm1:  > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>               ping-4981  [000] ....  2763.510760: 0: xps (RET -1): Hello, World!
>               ping-4998  [001] ....  2767.632583: 0: xps (RET -1): Hello, World!
>
> bpf prog unloading is not yet working and neither does `ip link show` report
> when an "xps" bpf prog is attached. This is my first time touching iproute2 or
> rtnetlink, so it may be something obvious to those more familiar.

Adding Jason... sorry for missing that the first time.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ