[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ8uoz15TKR7Tpx_brUut7XwHcEMCq90xOh7xFHhp-4+7BOMTQ@mail.gmail.com>
Date: Fri, 11 Dec 2020 08:47:31 +0100
From: Magnus Karlsson <magnus.karlsson@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Björn Töpel <bjorn.topel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
Maciej Fijalkowski <maciejromanfijalkowski@...il.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: Re: [PATCH bpf] xsk: fix race in SKB mode transmit with shared cq
On Thu, Dec 10, 2020 at 10:53 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Thu, Dec 10, 2020 at 7:36 AM Magnus Karlsson
> <magnus.karlsson@...il.com> wrote:
> >
> > From: Magnus Karlsson <magnus.karlsson@...el.com>
> >
> > Fix a race when multiple sockets are simultaneously calling sendto()
> > when the completion ring is shared in the SKB case. This is the case
> > when you share the same netdev and queue id through the
> > XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
> > be in xsk_generic_xmit() and call the backpressure mechanism in
> > xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
> > specific scenario, a race might occur since the rings are
> > single-producer single-consumer.
> >
> > Fix this by moving the tx_completion_lock from the socket to the pool
> > as the pool is shared between the sockets that share the completion
> > ring. (The pool is not shared when this is not the case.) And then
> > protect the accesses to xskq_prod_reserve() with this lock. The
> > tx_completion_lock is renamed cq_lock to better reflect that it
> > protects accesses to the potentially shared completion ring.
> >
> > Fixes: 35fcde7f8deb ("xsk: support for Tx")
> > Fixes: a9744f7ca200 ("xsk: fix potential race in SKB TX completion code")
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
> > Reported-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> > ---
> > include/net/xdp_sock.h | 4 ----
> > include/net/xsk_buff_pool.h | 5 +++++
> > net/xdp/xsk.c | 9 ++++++---
> > net/xdp/xsk_buff_pool.c | 1 +
> > 4 files changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> > index 4f4e93bf814c..cc17bc957548 100644
> > --- a/include/net/xdp_sock.h
> > +++ b/include/net/xdp_sock.h
> > @@ -58,10 +58,6 @@ struct xdp_sock {
> >
> > struct xsk_queue *tx ____cacheline_aligned_in_smp;
> > struct list_head tx_list;
> > - /* Mutual exclusion of NAPI TX thread and sendmsg error paths
> > - * in the SKB destructor callback.
> > - */
> > - spinlock_t tx_completion_lock;
> > /* Protects generic receive. */
> > spinlock_t rx_lock;
> >
> > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > index 01755b838c74..eaa8386dbc63 100644
> > --- a/include/net/xsk_buff_pool.h
> > +++ b/include/net/xsk_buff_pool.h
> > @@ -73,6 +73,11 @@ struct xsk_buff_pool {
> > bool dma_need_sync;
> > bool unaligned;
> > void *addrs;
> > + /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect:
> > + * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when
> > + * sockets share a single cq when the same netdev and queue id is shared.
> > + */
> > + spinlock_t cq_lock;
> > struct xdp_buff_xsk *free_heads[];
> > };
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 62504471fd20..42cb5f94d49e 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -364,9 +364,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > struct xdp_sock *xs = xdp_sk(skb->sk);
> > unsigned long flags;
> >
> > - spin_lock_irqsave(&xs->tx_completion_lock, flags);
> > + spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > xskq_prod_submit_addr(xs->pool->cq, addr);
> > - spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
> > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >
> > sock_wfree(skb);
> > }
> > @@ -378,6 +378,7 @@ static int xsk_generic_xmit(struct sock *sk)
> > bool sent_frame = false;
> > struct xdp_desc desc;
> > struct sk_buff *skb;
> > + unsigned long flags;
> > int err = 0;
> >
> > mutex_lock(&xs->mutex);
> > @@ -409,10 +410,13 @@ static int xsk_generic_xmit(struct sock *sk)
> > * if there is space in it. This avoids having to implement
> > * any buffering in the Tx path.
> > */
> > + spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> > kfree_skb(skb);
> > goto out;
> > }
> > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>
> Lock/unlock for every packet?
> Do you have any performance concerns?
I have measured and the performance impact of this is in the noise.
There is unfortunately already a lot of code being executed per packet
in this path. On the positive side, once this bug fix trickles down to
bpf-next, I will submit a rather simple patch to this function that
improves throughput by 15% for the txpush microbenchmark. This will
more than compensate for any locking introduced.
Powered by blists - more mailing lists