lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180515151314.tpflos2pxlvfc4dg@ast-mbp>
Date:   Tue, 15 May 2018 08:13:15 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
        Christoph Hellwig <hch@...radead.org>,
        BjörnTöpel <bjorn.topel@...el.com>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        makita.toshiaki@....ntt.co.jp
Subject: Re: [bpf-next V3 PATCH 4/4] xdp: change ndo_xdp_xmit API to support
 bulking

On Tue, May 15, 2018 at 02:13:50PM +0200, Jesper Dangaard Brouer wrote:
> This patch change the API for ndo_xdp_xmit to support bulking
> xdp_frames.
> 
> When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
> Most of the slowdown is caused by DMA API indirect function calls, but
> also the net_device->ndo_xdp_xmit() call.
> 
> Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
> single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
> performance improved:
>  for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
>  for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
> 
> With frames avail as a bulk inside the driver ndo_xdp_xmit call,
> further optimizations are possible, like bulk DMA-mapping for TX.
> 
> Testing without CONFIG_RETPOLINE show the same performance for
> physical NIC drivers.
> 
> The virtual NIC driver tun sees a huge performance boost, as it can
> avoid doing per frame producer locking, but instead amortize the
> locking cost over the bulk.
> 
> V2: Fix compile errors reported by kbuild test robot <lkp@...el.com>
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 +++++++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 ++++++--
>  drivers/net/tun.c                             |   37 +++++++++-----
>  drivers/net/virtio_net.c                      |   66 +++++++++++++++++++------
>  include/linux/netdevice.h                     |   14 +++--
>  include/net/page_pool.h                       |    5 +-
>  include/net/xdp.h                             |    1 
>  include/trace/events/xdp.h                    |   10 ++--
>  kernel/bpf/devmap.c                           |   33 ++++++++-----
>  net/core/filter.c                             |    4 +-
>  net/core/xdp.c                                |   20 ++++++--
>  samples/bpf/xdp_monitor_kern.c                |   10 ++++
>  samples/bpf/xdp_monitor_user.c                |   35 +++++++++++--
>  14 files changed, 206 insertions(+), 78 deletions(-)

This patch has to be split into at least five:
- bpf and net core piece
- intel driver changes
- tun/virtio changes
- addition of tracepoints
- addition to samples
Putting changes from all over the areas into one patch makes it harder
to review, bisect, ack, test, merge conflicts.

Same issue with 3/4 as well. Please split it into two (core and samples).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ