[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <HE1PR0501MB2570A5C385E2EA5A79CC1700D1180@HE1PR0501MB2570.eurprd05.prod.outlook.com>
Date: Tue, 11 Feb 2020 09:53:44 +0000
From: Maxim Mikityanskiy <maximmi@...lanox.com>
To: Magnus Karlsson <magnus.karlsson@...el.com>
CC: "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"ast@...nel.org" <ast@...nel.org>,
"bjorn.topel@...el.com" <bjorn.topel@...el.com>,
"rgoodfel@....edu" <rgoodfel@....edu>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"maciejromanfijalkowski@...il.com" <maciejromanfijalkowski@...il.com>
Subject: RE: [PATCH bpf] xsk: publish global consumer pointers when NAPI is
finished
On 2020-02-10 17:27, Magnus Karlsson wrote:
> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> introduced a much more lazy way of updating the global consumer
> pointers from the kernel side, by only doing so when running out of
> entries in the fill or Tx rings (the rings consumed by the
> kernel). This can result in a deadlock with the user application if
> the kernel requires more than one entry to proceed and the application
> cannot put these entries in the fill ring because the kernel has not
> updated the global consumer pointer since the ring is not empty.
>
> Fix this by publishing the local kernel side consumer pointer whenever
> we have completed Rx or Tx processing in the kernel. This way, user
> space will have an up-to-date view of the consumer pointers whenever it
> gets to execute in the one core case (application and driver on the
> same core), or after a certain number of packets have been processed
> in the two core case (application and driver on different cores).
>
> A side effect of this patch is that the one core case gets better
> performance, but the two core case gets worse. The reason that the one
> core case improves is that updating the global consumer pointer is
> relatively cheap since the application by definition is not running
> when the kernel is (they are on the same core) and it is beneficial
> for the application, once it gets to run, to have pointers that are
> as up to date as possible since it then can operate on more packets
> and buffers. In the two core case, the most important performance
> aspect is to minimize the number of accesses to the global pointers
> since they are shared between two cores and bounces between the caches
> of those cores. This patch results in more updates to global state,
> which means lower performance in the two core case.
>
> Fixes: 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> Reported-by: Ryan Goodfellow <rgoodfel@....edu>
> Reported-by: Maxim Mikityanskiy <maximmi@...lanox.com>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
Acked-by: Maxim Mikityanskiy <maximmi@...lanox.com>
> ---
> net/xdp/xsk.c | 2 ++
> net/xdp/xsk_queue.h | 3 ++-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index df60048..356f90e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -217,6 +217,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> static void xsk_flush(struct xdp_sock *xs)
> {
> xskq_prod_submit(xs->rx);
> + __xskq_cons_release(xs->umem->fq);
> sock_def_readable(&xs->sk);
> }
>
> @@ -304,6 +305,7 @@ void xsk_umem_consume_tx_done(struct xdp_umem *umem)
>
> rcu_read_lock();
> list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> + __xskq_cons_release(xs->tx);
> xs->sk.sk_write_space(&xs->sk);
> }
> rcu_read_unlock();
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index bec2af1..89a01ac 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -271,7 +271,8 @@ static inline void xskq_cons_release(struct xsk_queue *q)
> {
> /* To improve performance, only update local state here.
> * Reflect this to global state when we get new entries
> - * from the ring in xskq_cons_get_entries().
> + * from the ring in xskq_cons_get_entries() and whenever
> + * Rx or Tx processing are completed in the NAPI loop.
> */
> q->cached_cons++;
> }
>
Powered by blists - more mailing lists