lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <fa956c59-0715-c9ea-ec50-2b1bee210567@intel.com> Date: Thu, 2 May 2019 17:23:18 -0700 From: "Samudrala, Sridhar" <sridhar.samudrala@...el.com> To: Magnus Karlsson <magnus.karlsson@...el.com>, bjorn.topel@...el.com, ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org, brouer@...hat.com Cc: bpf@...r.kernel.org, bruce.richardson@...el.com, ciara.loftus@...el.com, jakub.kicinski@...ronome.com, xiaolong.ye@...el.com, qi.z.zhang@...el.com, maximmi@...lanox.com, kevin.laatz@...el.com Subject: Re: [RFC bpf-next 5/7] net: add busy-poll support for XDP sockets On 5/2/2019 1:39 AM, Magnus Karlsson wrote: > This patch adds busy-poll support for XDP sockets (AF_XDP). With > busy-poll, the driver is executed in process context by calling the > poll() syscall. The main advantage with this is that all processing > occurs on a single core. This eliminates the core-to-core cache > transfers that occur between the application and the softirqd > processing on another core, that occurs without busy-poll. From a > systems point of view, it also provides an advatage that we do not > have to provision extra cores in the system to handle > ksoftirqd/softirq processing, as all processing is done on the single > core that executes the application. The drawback of busy-poll is that > max throughput seen from a single application will be lower (due to > the syscall), but on a per core basis it will often be higher as the > normal mode runs on two cores and busy-poll on a single one. > > The semantics of busy-poll from the application point of view are the > following: > > * The application is required to call poll() to drive rx and tx > processing. There is no guarantee that softirq and interrupts will > do this for you. > > * It should be enabled on a per socket basis. No global enablement, i.e. > the XDP socket busy-poll will not care about the current > /proc/sys/net/core/busy_poll and busy_read global enablement > mechanisms. > > * The batch size (how many packets that are processed every time the > napi function in the driver is called, i.e. the weight parameter) > should be configurable. Currently, the busy-poll size of AF_INET > sockets is set to 8, but for AF_XDP sockets this is too small as the > amount of processing per packet is much smaller with AF_XDP. This > should be configurable on a per socket basis. > > * If you put multiple AF_XDP busy-poll enabled sockets into a poll() > call the napi contexts of all of them should be executed. This is in > contrast to the AF_INET busy-poll that quits after the fist one that > finds any packets. We need all napi contexts to be executed due to > the first requirement in this list. The behaviour we want is much more > like regular sockets in that all of them are checked in the poll > call. > > * Should be possible to mix AF_XDP busy-poll sockets with any other > sockets including busy-poll AF_INET ones in a single poll() call > without any change to semantics or the behaviour of any of those > socket types. > > * As suggested by Maxim Mikityanskiy, poll() will in the busy-poll > mode return POLLERR if the fill ring is empty or the completion > queue is full. > > Busy-poll support is enabled by calling a new setsockopt called > XDP_BUSY_POLL_BATCH_SIZE that takes batch size as an argument. A value > between 1 and NAPI_WEIGHT (64) will turn it on, 0 will turn it off and > any other value will return an error. > > A typical packet processing rxdrop loop with busy-poll will look something > like this: > > for (i = 0; i < num_socks; i++) { > fds[i].fd = xsk_socket__fd(xsks[i]->xsk); > fds[i].events = POLLIN; > } > > for (;;) { > ret = poll(fds, num_socks, 0); > if (ret <= 0) > continue; > > for (i = 0; i < num_socks; i++) > rx_drop(xsks[i], fds); /* The actual application */ > } > > Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com> > --- > include/net/xdp_sock.h | 3 ++ > net/xdp/Kconfig | 1 + > net/xdp/xsk.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++- > net/xdp/xsk_queue.h | 18 ++++++-- > 4 files changed, 138 insertions(+), 6 deletions(-) > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > index d074b6d..2e956b37 100644 > --- a/include/net/xdp_sock.h > +++ b/include/net/xdp_sock.h > @@ -57,7 +57,10 @@ struct xdp_sock { > struct net_device *dev; > struct xdp_umem *umem; > struct list_head flush_node; > + unsigned int napi_id_rx; > + unsigned int napi_id_tx; > u16 queue_id; > + u16 bp_batch_size; > struct xsk_queue *tx ____cacheline_aligned_in_smp; > struct list_head list; > bool zc; > diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig > index 0255b33..219baaa 100644 > --- a/net/xdp/Kconfig > +++ b/net/xdp/Kconfig > @@ -1,6 +1,7 @@ > config XDP_SOCKETS > bool "XDP sockets" > depends on BPF_SYSCALL > + select NET_RX_BUSY_POLL > default n > help > XDP sockets allows a channel between XDP programs and > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index a14e886..bd3d0fe 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -22,6 +22,7 @@ > #include <linux/net.h> > #include <linux/netdevice.h> > #include <linux/rculist.h> > +#include <net/busy_poll.h> > #include <net/xdp_sock.h> > #include <net/xdp.h> > > @@ -302,16 +303,107 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) > return (xs->zc) ? xsk_zc_xmit(sk) : xsk_generic_xmit(sk, m, total_len); > } > > +static unsigned int xsk_check_rx_poll_err(struct xdp_sock *xs) > +{ > + return xskq_consumer_empty(xs->umem->fq) ? POLLERR : 0; > +} > + > +static unsigned int xsk_check_tx_poll_err(struct xdp_sock *xs) > +{ > + return xskq_producer_full(xs->umem->cq) ? POLLERR : 0; > +} > + > +static bool xsk_busy_loop_end(void *p, unsigned long start_time) > +{ > + return true; This function should be updated to return TRUE on busy poll timeout OR it should check for xskq_full_desc on TX and xskq_empty_desc on RX > +} > +
Powered by blists - more mailing lists