lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Feb 2017 11:13:43 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        David Miller <davem@...emloft.net>
Cc:     brouer@...hat.com, Saeed Mahameed <saeedm@...lanox.com>,
        Tom Herbert <tom@...bertland.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Brenden Blanco <bblanco@...il.com>
Subject: Focusing the XDP project


First thing to bring in order for the XDP project:

  RX batching is missing.

I don't want to discuss packet page-sizes or multi-port forwarding,
before we have established the most fundamental principal that all
other solution use; RX batching.

Without building in RX batching, from the beginning/now, the XDP
architecture have lost.  As adding features and capabilities, will
just lead us back to the exact same performance problems as before!


Today we already have the 64 packets NAPI budget, but we are not
taking advantage of this. For XDP as long as eBPF always return
XDP_DROP or XDP_TX, then we (falsely) experience the effect of bulking
(as code fits within the icache) and see huge perf boosts.

The initial principal of bulking/batching packets to amortize per
packet costs.  The next step is just as important: Lookup table sizes
(FIB) kills performance again. The solution is implementing a smart
table lookup scheme that prefetch hash table key-cells and afterwards
prefetch data-cells, based on the RX batch of packets.  Notice VPP
revolves around similar tricks, and why it beats DPDK, and why it
scales with 1Millon routes.

I hope I've made it very clear where the focus for XDP should be.
This involves implementing what I call RX-stages in the drivers. While
doing that we can figure out the most optimal data structure for
packet batching.
 I know Saeed is already working on RX-stages for mlx5, and I've tested
the initial version of his patch, and the results are excellent.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ