[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180515151314.tpflos2pxlvfc4dg@ast-mbp>
Date: Tue, 15 May 2018 08:13:15 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
Christoph Hellwig <hch@...radead.org>,
BjörnTöpel <bjorn.topel@...el.com>,
Magnus Karlsson <magnus.karlsson@...el.com>,
makita.toshiaki@....ntt.co.jp
Subject: Re: [bpf-next V3 PATCH 4/4] xdp: change ndo_xdp_xmit API to support
bulking
On Tue, May 15, 2018 at 02:13:50PM +0200, Jesper Dangaard Brouer wrote:
> This patch change the API for ndo_xdp_xmit to support bulking
> xdp_frames.
>
> When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
> Most of the slowdown is caused by DMA API indirect function calls, but
> also the net_device->ndo_xdp_xmit() call.
>
> Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
> single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
> performance improved:
> for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
> for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
>
> With frames avail as a bulk inside the driver ndo_xdp_xmit call,
> further optimizations are possible, like bulk DMA-mapping for TX.
>
> Testing without CONFIG_RETPOLINE show the same performance for
> physical NIC drivers.
>
> The virtual NIC driver tun sees a huge performance boost, as it can
> avoid doing per frame producer locking, but instead amortize the
> locking cost over the bulk.
>
> V2: Fix compile errors reported by kbuild test robot <lkp@...el.com>
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 +++++++---
> drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 -
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 ++++++--
> drivers/net/tun.c | 37 +++++++++-----
> drivers/net/virtio_net.c | 66 +++++++++++++++++++------
> include/linux/netdevice.h | 14 +++--
> include/net/page_pool.h | 5 +-
> include/net/xdp.h | 1
> include/trace/events/xdp.h | 10 ++--
> kernel/bpf/devmap.c | 33 ++++++++-----
> net/core/filter.c | 4 +-
> net/core/xdp.c | 20 ++++++--
> samples/bpf/xdp_monitor_kern.c | 10 ++++
> samples/bpf/xdp_monitor_user.c | 35 +++++++++++--
> 14 files changed, 206 insertions(+), 78 deletions(-)
This patch has to be split into at least five:
- bpf and net core piece
- intel driver changes
- tun/virtio changes
- addition of tracepoints
- addition to samples
Putting changes from all over the areas into one patch makes it harder
to review, bisect, ack, test, merge conflicts.
Same issue with 3/4 as well. Please split it into two (core and samples).
Powered by blists - more mailing lists