[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1556786363-28743-1-git-send-email-magnus.karlsson@intel.com>
Date: Thu, 2 May 2019 10:39:16 +0200
From: Magnus Karlsson <magnus.karlsson@...el.com>
To: magnus.karlsson@...el.com, bjorn.topel@...el.com, ast@...nel.org,
daniel@...earbox.net, netdev@...r.kernel.org, brouer@...hat.com
Cc: bpf@...r.kernel.org, bruce.richardson@...el.com,
ciara.loftus@...el.com, jakub.kicinski@...ronome.com,
xiaolong.ye@...el.com, qi.z.zhang@...el.com, maximmi@...lanox.com,
sridhar.samudrala@...el.com, kevin.laatz@...el.com
Subject: [RFC bpf-next 0/7] busy poll support for AF_XDP sockets
This RFC proposes to add busy-poll support to AF_XDP sockets. With
busy-poll, the driver is executed in process context by calling the
poll() syscall. The main advantage with this is that all processing
occurs on a single core. This eliminates the core-to-core cache
transfers that occur between the application and the softirqd
processing on another core, that occurs without busy-poll. From a
systems point of view, it also provides an advatage that we do not
have to provision extra cores in the system to handle
ksoftirqd/softirq processing, as all processing is done on the single
core that executes the application. The drawback of busy-poll is that
max throughput seen from a single application will be lower (due to
the syscall), but on a per core basis it will often be higher as
the normal mode runs on two cores and busy-poll on a single one.
The semantics of busy-poll from the application point of view are the
following:
* The application is required to call poll() to drive rx and tx
processing. There is no guarantee that softirq and interrupts will
do this for you. This is in contrast with the current
implementations of busy-poll that are opportunistic in the sense
that packets might be received/transmitted by busy-poll or
softirqd. (In this patch set, softirq/ksoftirqd will kick in at high
loads just as the current opportunistic implementations, but I would
like to get to a point where this is not the case for busy-poll
enabled XDP sockets, as this slows down performance considerably and
starts to use one more core for the softirq processing. The end goal
is for only poll() to drive the napi loop when busy-poll is enabled
on an AF_XDP socket. More about this later.)
* It should be enabled on a per socket basis. No global enablement, i.e.
the XDP socket busy-poll will not care about the current
/proc/sys/net/core/busy_poll and busy_read global enablement
mechanisms.
* The batch size (how many packets that are processed every time the
napi function in the driver is called, i.e. the weight parameter)
should be configurable. Currently, the busy-poll size of AF_INET
sockets is set to 8, but for AF_XDP sockets this is too small as the
amount of processing per packet is much smaller with AF_XDP. This
should be configurable on a per socket basis.
* If you put multiple AF_XDP busy-poll enabled sockets into a poll()
call the napi contexts of all of them should be executed. This is in
contrast to the AF_INET busy-poll that quits after the fist one that
finds any packets. We need all napi contexts to be executed due to
the first requirement in this list. The behaviour we want is much more
like regular sockets in that all of them are checked in the poll
call.
* Should be possible to mix AF_XDP busy-poll sockets with any other
sockets including busy-poll AF_INET ones in a single poll() call
without any change to semantics or the behaviour of any of those
socket types.
* As suggested by Maxim Mikityanskiy, poll() will in the busy-poll
mode return POLLERR if the fill ring is empty or the completion
queue is full.
Busy-poll support is enabled by calling a new setsockopt called
XDP_BUSY_POLL_BATCH_SIZE that takes batch size as an argument. A value
between 1 and NAPI_WEIGHT (64) will turn it on, 0 will turn it off and
any other value will return an error.
A typical packet processing rxdrop loop with busy-poll will look something
like this:
for (i = 0; i < num_socks; i++) {
fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
fds[i].events = POLLIN;
}
for (;;) {
ret = poll(fds, num_socks, 0);
if (ret <= 0)
continue;
for (i = 0; i < num_socks; i++)
rx_drop(xsks[i], fds); /* The actual application */
}
Need some advice around this issue please:
In this patch set, softirq/ksoftirqd will kick in at high loads and
render the busy poll support useless as the execution is now happening
in the same way as without busy-poll support. Everything works from an
application perspective but this defeats the purpose of the support
and also consumes an extra core. What I would like to accomplish when
XDP socket busy-poll is enabled is that softirq/ksoftirq is never
invoked for the traffic that goes to this socket. This way, we would
get better performance on a per core basis and also get the same
behaviour independent of load.
To accomplish this, the driver would need to create a napi context
containing the busy-poll enabled XDP socket queues (not shared with
any other traffic) that do not have an interrupt associated with
it.
Does this sound like an avenue worth pursuing and if so, should it be
part of this RFC/PATCH or separate?
Epoll() is not supported at this point in time. Since it was designed
for handling a very large number of sockets, I thought it was better
to leave this for later if the need will arise.
To do:
* Move over all drivers to the new xdp_[rt]xq_info functions. If you
think this is the right way to go forward, I will move over
Mellanox, Netronome, etc for the proper patch.
* Performance measurements
* Test SW drivers like virtio, veth and tap. Actually, test anything
that is not i40e.
* Test multiple sockets of different types in the same poll() call
* Check bisectability of each patch
* Versioning of the libbpf interface since we add an entry to the
socket configuration struct
This RFC has been applied against commit 2b5bc3c8ebce ("r8169: remove manual autoneg restart workaround")
Structure of the patch set:
Patch 1: Makes the busy poll batch size configurable inside the kernel
Patch 2: Adds napi id to struct xdp_rxq_info, adds this to the
xdp_rxq_reg_info function and changes the interface and
implementation so that we only need a single copy of the
xdp_rxq_info struct that resides in _rx. Previously there was
another one in the driver, but now the driver uses the one in
_rx. All XDP enabled drivers are converted to these new
functions.
Patch 3: Adds a struct xdp_txq_info with corresponding functions to
xdp_rxq_info that is used to store information about the tx
queue. The use of these are added to all AF_XDP enabled drivers.
Patch 4: Introduce a new setsockopt/getsockopt in the uapi for
enabling busy_poll.
Patch 5: Implements busy poll in the xsk code.
Patch 6: Add busy-poll support to libbpf.
Patch 7: Add busy-poll support to the xdpsock sample application.
Thanks: Magnus
Magnus Karlsson (7):
net: fs: make busy poll budget configurable in napi_busy_loop
net: i40e: ixgbe: tun: veth: virtio-net: centralize xdp_rxq_info and
add napi id
net: i40e: ixgbe: add xdp_txq_info structure
netdevice: introduce busy-poll setsockopt for AF_XDP
net: add busy-poll support for XDP sockets
libbpf: add busy-poll support to XDP sockets
samples/bpf: add busy-poll support to xdpsock sample
drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 -
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 37 ++++-
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +-
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 48 ++++--
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
drivers/net/tun.c | 14 +-
drivers/net/veth.c | 10 +-
drivers/net/virtio_net.c | 8 +-
fs/eventpoll.c | 5 +-
include/linux/netdevice.h | 1 +
include/net/busy_poll.h | 7 +-
include/net/xdp.h | 23 ++-
include/net/xdp_sock.h | 3 +
include/uapi/linux/if_xdp.h | 1 +
net/core/dev.c | 43 ++----
net/core/xdp.c | 103 ++++++++++---
net/xdp/Kconfig | 1 +
net/xdp/xsk.c | 122 ++++++++++++++-
net/xdp/xsk_queue.h | 18 ++-
samples/bpf/xdpsock_user.c | 203 +++++++++++++++----------
tools/include/uapi/linux/if_xdp.h | 1 +
tools/lib/bpf/xsk.c | 23 +--
tools/lib/bpf/xsk.h | 1 +
26 files changed, 495 insertions(+), 195 deletions(-)
--
2.7.4
Powered by blists - more mailing lists