[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa956c59-0715-c9ea-ec50-2b1bee210567@intel.com>
Date: Thu, 2 May 2019 17:23:18 -0700
From: "Samudrala, Sridhar" <sridhar.samudrala@...el.com>
To: Magnus Karlsson <magnus.karlsson@...el.com>, bjorn.topel@...el.com,
ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
brouer@...hat.com
Cc: bpf@...r.kernel.org, bruce.richardson@...el.com,
ciara.loftus@...el.com, jakub.kicinski@...ronome.com,
xiaolong.ye@...el.com, qi.z.zhang@...el.com, maximmi@...lanox.com,
kevin.laatz@...el.com
Subject: Re: [RFC bpf-next 5/7] net: add busy-poll support for XDP sockets
On 5/2/2019 1:39 AM, Magnus Karlsson wrote:
> This patch adds busy-poll support for XDP sockets (AF_XDP). With
> busy-poll, the driver is executed in process context by calling the
> poll() syscall. The main advantage with this is that all processing
> occurs on a single core. This eliminates the core-to-core cache
> transfers that occur between the application and the softirqd
> processing on another core, that occurs without busy-poll. From a
> systems point of view, it also provides an advatage that we do not
> have to provision extra cores in the system to handle
> ksoftirqd/softirq processing, as all processing is done on the single
> core that executes the application. The drawback of busy-poll is that
> max throughput seen from a single application will be lower (due to
> the syscall), but on a per core basis it will often be higher as the
> normal mode runs on two cores and busy-poll on a single one.
>
> The semantics of busy-poll from the application point of view are the
> following:
>
> * The application is required to call poll() to drive rx and tx
> processing. There is no guarantee that softirq and interrupts will
> do this for you.
>
> * It should be enabled on a per socket basis. No global enablement, i.e.
> the XDP socket busy-poll will not care about the current
> /proc/sys/net/core/busy_poll and busy_read global enablement
> mechanisms.
>
> * The batch size (how many packets that are processed every time the
> napi function in the driver is called, i.e. the weight parameter)
> should be configurable. Currently, the busy-poll size of AF_INET
> sockets is set to 8, but for AF_XDP sockets this is too small as the
> amount of processing per packet is much smaller with AF_XDP. This
> should be configurable on a per socket basis.
>
> * If you put multiple AF_XDP busy-poll enabled sockets into a poll()
> call the napi contexts of all of them should be executed. This is in
> contrast to the AF_INET busy-poll that quits after the fist one that
> finds any packets. We need all napi contexts to be executed due to
> the first requirement in this list. The behaviour we want is much more
> like regular sockets in that all of them are checked in the poll
> call.
>
> * Should be possible to mix AF_XDP busy-poll sockets with any other
> sockets including busy-poll AF_INET ones in a single poll() call
> without any change to semantics or the behaviour of any of those
> socket types.
>
> * As suggested by Maxim Mikityanskiy, poll() will in the busy-poll
> mode return POLLERR if the fill ring is empty or the completion
> queue is full.
>
> Busy-poll support is enabled by calling a new setsockopt called
> XDP_BUSY_POLL_BATCH_SIZE that takes batch size as an argument. A value
> between 1 and NAPI_WEIGHT (64) will turn it on, 0 will turn it off and
> any other value will return an error.
>
> A typical packet processing rxdrop loop with busy-poll will look something
> like this:
>
> for (i = 0; i < num_socks; i++) {
> fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
> fds[i].events = POLLIN;
> }
>
> for (;;) {
> ret = poll(fds, num_socks, 0);
> if (ret <= 0)
> continue;
>
> for (i = 0; i < num_socks; i++)
> rx_drop(xsks[i], fds); /* The actual application */
> }
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
> ---
> include/net/xdp_sock.h | 3 ++
> net/xdp/Kconfig | 1 +
> net/xdp/xsk.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++-
> net/xdp/xsk_queue.h | 18 ++++++--
> 4 files changed, 138 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index d074b6d..2e956b37 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -57,7 +57,10 @@ struct xdp_sock {
> struct net_device *dev;
> struct xdp_umem *umem;
> struct list_head flush_node;
> + unsigned int napi_id_rx;
> + unsigned int napi_id_tx;
> u16 queue_id;
> + u16 bp_batch_size;
> struct xsk_queue *tx ____cacheline_aligned_in_smp;
> struct list_head list;
> bool zc;
> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> index 0255b33..219baaa 100644
> --- a/net/xdp/Kconfig
> +++ b/net/xdp/Kconfig
> @@ -1,6 +1,7 @@
> config XDP_SOCKETS
> bool "XDP sockets"
> depends on BPF_SYSCALL
> + select NET_RX_BUSY_POLL
> default n
> help
> XDP sockets allows a channel between XDP programs and
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index a14e886..bd3d0fe 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -22,6 +22,7 @@
> #include <linux/net.h>
> #include <linux/netdevice.h>
> #include <linux/rculist.h>
> +#include <net/busy_poll.h>
> #include <net/xdp_sock.h>
> #include <net/xdp.h>
>
> @@ -302,16 +303,107 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> return (xs->zc) ? xsk_zc_xmit(sk) : xsk_generic_xmit(sk, m, total_len);
> }
>
> +static unsigned int xsk_check_rx_poll_err(struct xdp_sock *xs)
> +{
> + return xskq_consumer_empty(xs->umem->fq) ? POLLERR : 0;
> +}
> +
> +static unsigned int xsk_check_tx_poll_err(struct xdp_sock *xs)
> +{
> + return xskq_producer_full(xs->umem->cq) ? POLLERR : 0;
> +}
> +
> +static bool xsk_busy_loop_end(void *p, unsigned long start_time)
> +{
> + return true;
This function should be updated to return TRUE on busy poll timeout OR
it should check for xskq_full_desc on TX and xskq_empty_desc on RX
> +}
> +
Powered by blists - more mailing lists