[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGyo_hrkUPxBAOttpo+xqcUHr89i2N5BOV4_rYFh=dRPkYVPGA@mail.gmail.com>
Date: Thu, 19 Sep 2019 17:05:14 -0700
From: Matt Cover <werekraken@...il.com>
To: davem@...emloft.net, ast@...nel.org, daniel@...earbox.net,
kafai@...com, songliubraving@...com, yhs@...com,
nikolay@...ulusnetworks.com, sd@...asysnail.net,
sbrivio@...hat.com, vincent@...nat.ch, kda@...ux-powerpc.org,
Matthew Cover <matthew.cover@...ckpath.com>, jiri@...lanox.com,
Eric Dumazet <edumazet@...gle.com>, pabeni@...hat.com,
idosch@...lanox.com, petrm@...lanox.com, f.fainelli@...il.com,
stephen@...workplumber.org, dsahern@...il.com,
christian@...uner.io, jakub.kicinski@...ronome.com,
Roopa Prabhu <roopa@...ulusnetworks.com>,
johannes.berg@...el.com, mkubecek@...e.cz, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
Jason Wang <jasowang@...hat.com>
Subject: Re: [RFC {net,iproute2}-next 0/2] Introduce an eBPF hookpoint for tx
queue selection in the XPS (Transmit Packet Steering) code.
On Thu, Sep 19, 2019 at 3:45 PM Matthew Cover <werekraken@...il.com> wrote:
>
> WORK IN PROGRESS:
> * bpf program loading works!
> * txq steering via bpf program return code works!
> * bpf program unloading not working.
> * bpf program attached query not working.
>
> This patch set provides a bpf hookpoint with goals similar to, but a more
> generic implementation than, TUNSETSTEERINGEBPF; userspace supplied tx queue
> selection policy.
>
> TUNSETSTEERINGEBPF is a useful bpf hookpoint, but has some drawbacks.
>
> First, it only works on tun/tap devices.
>
> Second, there is no way in the current TUNSETSTEERINGEBPF implementation
> to bail out or load a noop bpf prog and fallback to the no prog tx queue
> selection method.
>
> Third, the TUNSETSTEERINGEBPF interface seems to require possession of existing
> or creation of new queues/fds.
>
> This most naturally fits in the "wire" implementation since possession of fds
> is ensured. However, it also means the various "wire" implementations (e.g.
> qemu) have to all be made aware of TUNSETSTEERINGEBPF and expose an interface
> to load/unload a bpf prog (or provide a mechanism to pass an fd to another
> program).
>
> Alternatively, you can spin up an extra queue and immediately disable via
> IFF_DETACH_QUEUE, but this seems unsafe; packets could be enqueued to this
> extra file descriptor which is part of our bpf prog loader, not our "wire".
>
> Placing this in the XPS code and leveraging iproute2 and rtnetlink to provide
> our bpf prog loader in a similar manner to xdp gives us a nice way to separate
> the tap "wire" and the loading of tx queue selection policy. It also lets us
> use this hookpoint for any device traversing XPS.
>
> This patch only introduces the new hookpoint to the XPS code and will not yet
> be used by tun/tap devices using the intree tun.ko (which implements an
> .ndo_select_queue and does not traverse the XPS code).
>
> In a future patch set, we can optionally refactor tun.ko to traverse this call
> to bpf_prog_run_clear_cb() and bpf prog storage. tun/tap devices could then
> leverage iproute2 as a generic loader. The TUNSETSTEERINGEBPF interface could
> at this point be optionally deprecated/removed.
>
> Both patches in this set have been tested using a rebuilt tun.ko with no
> .ndo_select_queue.
>
> sed -i '/\.ndo_select_queue.*=/d' drivers/net/tun.c
>
> The tap device was instantiated using tap_mq_pong.c, supporting scripts, and
> wrapping service found here:
>
> https://github.com/stackpath/rxtxcpu/tree/v1.2.6/helpers
>
> The bpf prog source and test scripts can be found here:
>
> https://github.com/werekraken/xps_ebpf
>
> In nstxq, netsniff-ng using PACKET_FANOUT_QM is leveraged to check the
> queue_mapping.
>
> With no prog loaded, the tx queue selection is adhering our xps_cpus
> configuration.
>
> [vagrant@...alhost ~]$ grep . /sys/class/net/tap0/queues/tx-*/xps_cpus; ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe;
> /sys/class/net/tap0/queues/tx-0/xps_cpus:1
> /sys/class/net/tap0/queues/tx-1/xps_cpus:2
> cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.146 ms
> cpu0: qm0: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.121 ms
> cpu1: qm1: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
>
> With a return 0 bpg prog, our tx queue is 0 (despite xps_cpus).
>
> [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello0.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
> cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.160 ms
> cpu0: qm0: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.124 ms
> cpu1: qm0: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> ping-4852 [000] .... 2691.633260: 0: xps (RET 0): Hello, World!
> ping-4869 [001] .... 2695.753588: 0: xps (RET 0): Hello, World!
>
> With a return 1 bpg prog, our tx queue is 1.
>
> [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello1.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
> cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.193 ms
> cpu0: qm1: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.135 ms
> cpu1: qm1: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> ping-4894 [000] .... 2710.652080: 0: xps (RET 1): Hello, World!
> ping-4911 [001] .... 2714.774608: 0: xps (RET 1): Hello, World!
>
> With a return 2 bpg prog, our tx queue is 0 (we only have 2 tx queues).
>
> [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello2.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
> cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=1.20 ms
> cpu0: qm0: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.986 ms
> cpu1: qm0: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> ping-4936 [000] .... 2729.442668: 0: xps (RET 2): Hello, World!
> ping-4953 [001] .... 2733.614558: 0: xps (RET 2): Hello, World!
>
> With a return -1 bpf prog, our tx queue selection is once again determined by
> xps_cpus. Any negative return should work the same and provides a nice
> mechanism to bail out or have a noop bpf prog at this hookpoint.
>
> [vagrant@...alhost ~]$ sudo ip link set dev tap0 xps obj hello_neg1.o sec hello && { ./nstxq; sudo timeout 1 cat /sys/kernel/debug/tracing/trace_pipe; }
> cpu0: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.628 ms
> cpu0: qm0: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> cpu1: ping: 64 bytes from 169.254.254.1: icmp_seq=1 ttl=64 time=0.322 ms
> cpu1: qm1: > tap0 98 Unknown => Unknown IPv4 169.254.254.2/169.254.254.1 Len 84 Type 8 Code 0
> ping-4981 [000] .... 2763.510760: 0: xps (RET -1): Hello, World!
> ping-4998 [001] .... 2767.632583: 0: xps (RET -1): Hello, World!
>
> bpf prog unloading is not yet working and neither does `ip link show` report
> when an "xps" bpf prog is attached. This is my first time touching iproute2 or
> rtnetlink, so it may be something obvious to those more familiar.
Adding Jason... sorry for missing that the first time.
Powered by blists - more mailing lists