[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQKOjetBFuCVRWPEzephJTeZ7AYaHv+pKfJKia0F8vk=ww@mail.gmail.com>
Date: Thu, 10 Dec 2020 13:52:52 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Magnus Karlsson <magnus.karlsson@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Björn Töpel <bjorn.topel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
Maciej Fijalkowski <maciejromanfijalkowski@...il.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: Re: [PATCH bpf] xsk: fix race in SKB mode transmit with shared cq
On Thu, Dec 10, 2020 at 7:36 AM Magnus Karlsson
<magnus.karlsson@...il.com> wrote:
>
> From: Magnus Karlsson <magnus.karlsson@...el.com>
>
> Fix a race when multiple sockets are simultaneously calling sendto()
> when the completion ring is shared in the SKB case. This is the case
> when you share the same netdev and queue id through the
> XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
> be in xsk_generic_xmit() and call the backpressure mechanism in
> xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
> specific scenario, a race might occur since the rings are
> single-producer single-consumer.
>
> Fix this by moving the tx_completion_lock from the socket to the pool
> as the pool is shared between the sockets that share the completion
> ring. (The pool is not shared when this is not the case.) And then
> protect the accesses to xskq_prod_reserve() with this lock. The
> tx_completion_lock is renamed cq_lock to better reflect that it
> protects accesses to the potentially shared completion ring.
>
> Fixes: 35fcde7f8deb ("xsk: support for Tx")
> Fixes: a9744f7ca200 ("xsk: fix potential race in SKB TX completion code")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
> Reported-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> ---
> include/net/xdp_sock.h | 4 ----
> include/net/xsk_buff_pool.h | 5 +++++
> net/xdp/xsk.c | 9 ++++++---
> net/xdp/xsk_buff_pool.c | 1 +
> 4 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 4f4e93bf814c..cc17bc957548 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -58,10 +58,6 @@ struct xdp_sock {
>
> struct xsk_queue *tx ____cacheline_aligned_in_smp;
> struct list_head tx_list;
> - /* Mutual exclusion of NAPI TX thread and sendmsg error paths
> - * in the SKB destructor callback.
> - */
> - spinlock_t tx_completion_lock;
> /* Protects generic receive. */
> spinlock_t rx_lock;
>
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index 01755b838c74..eaa8386dbc63 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -73,6 +73,11 @@ struct xsk_buff_pool {
> bool dma_need_sync;
> bool unaligned;
> void *addrs;
> + /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect:
> + * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when
> + * sockets share a single cq when the same netdev and queue id is shared.
> + */
> + spinlock_t cq_lock;
> struct xdp_buff_xsk *free_heads[];
> };
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 62504471fd20..42cb5f94d49e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -364,9 +364,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> struct xdp_sock *xs = xdp_sk(skb->sk);
> unsigned long flags;
>
> - spin_lock_irqsave(&xs->tx_completion_lock, flags);
> + spin_lock_irqsave(&xs->pool->cq_lock, flags);
> xskq_prod_submit_addr(xs->pool->cq, addr);
> - spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
> + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>
> sock_wfree(skb);
> }
> @@ -378,6 +378,7 @@ static int xsk_generic_xmit(struct sock *sk)
> bool sent_frame = false;
> struct xdp_desc desc;
> struct sk_buff *skb;
> + unsigned long flags;
> int err = 0;
>
> mutex_lock(&xs->mutex);
> @@ -409,10 +410,13 @@ static int xsk_generic_xmit(struct sock *sk)
> * if there is space in it. This avoids having to implement
> * any buffering in the Tx path.
> */
> + spin_lock_irqsave(&xs->pool->cq_lock, flags);
> if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> kfree_skb(skb);
> goto out;
> }
> + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
Lock/unlock for every packet?
Do you have any performance concerns?
Powered by blists - more mailing lists