[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <152638638695.9477.13781600009169577949.stgit@firesoul>
Date: Tue, 15 May 2018 14:13:30 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Christoph Hellwig <hch@...radead.org>,
BjörnTöpel <bjorn.topel@...el.com>,
Magnus Karlsson <magnus.karlsson@...el.com>,
makita.toshiaki@....ntt.co.jp
Subject: [bpf-next V3 PATCH 0/4] xdp: introduce bulking for ndo_xdp_xmit API
This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.
When kernel is compiled with CONFIG_RETPOLINE, every indirect function
pointer (branch) call hurts performance. For XDP this have a huge
negative performance impact.
This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
also prepares for further optimizations. The DMA APIs use of indirect
function pointer calls is the primary source the regression. It is
left for a followup patchset, to use bulking calls towards the DMA API
(via the scatter-gatter calls).
The other advantage of this API change is that drivers can easier
amortize the cost of any sync/locking scheme, over the bulk of
packets. The assumption of the current API is that the driver
implemementing the NDO will also allocate a dedicated XDP TX queue for
every CPU in the system. Which is not always possible or practical to
configure. E.g. ixgbe cannot load an XDP program on a machine with
more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net
is hard to configure as it requires manually increasing the
queues. E.g. tun driver chooses to use a per XDP frame producer lock
modulo smp_processor_id over avail queues.
I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of
this patchset. This will be a followup patchset, once we know if this
will be needed (e.g. for non-map xdp_redirect flush-flag, and if
AF_XDP chooses to use ndo_xdp_xmit for TX).
---
Jesper Dangaard Brouer (4):
bpf: devmap introduce dev_map_enqueue
bpf: devmap prepare xdp frames for bulking
xdp: add tracepoint for devmap like cpumap have
xdp: change ndo_xdp_xmit API to support bulking
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 ++++-
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 +++-
drivers/net/tun.c | 37 ++++---
drivers/net/virtio_net.c | 66 +++++++++---
include/linux/bpf.h | 16 ++-
include/linux/netdevice.h | 14 ++-
include/net/page_pool.h | 5 +
include/net/xdp.h | 1
include/trace/events/xdp.h | 50 +++++++++
kernel/bpf/devmap.c | 134 ++++++++++++++++++++++++-
net/core/filter.c | 19 +---
net/core/xdp.c | 20 +++-
samples/bpf/xdp_monitor_kern.c | 49 +++++++++
samples/bpf/xdp_monitor_user.c | 69 +++++++++++++
15 files changed, 446 insertions(+), 83 deletions(-)
--
Powered by blists - more mailing lists