[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ8uoz3RORXKp-Dvv0OtvSNoLVeu5ZqH0UeNU=PhJ=UwhgrXfA@mail.gmail.com>
Date: Mon, 14 Dec 2020 10:00:42 +0100
From: Magnus Karlsson <magnus.karlsson@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Björn Töpel <bjorn.topel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
Maciej Fijalkowski <maciejromanfijalkowski@...il.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: Re: [PATCH bpf] xsk: fix race in SKB mode transmit with shared cq
On Fri, Dec 11, 2020 at 8:47 AM Magnus Karlsson
<magnus.karlsson@...il.com> wrote:
>
> On Thu, Dec 10, 2020 at 10:53 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Thu, Dec 10, 2020 at 7:36 AM Magnus Karlsson
> > <magnus.karlsson@...il.com> wrote:
> > >
> > > From: Magnus Karlsson <magnus.karlsson@...el.com>
> > >
> > > Fix a race when multiple sockets are simultaneously calling sendto()
> > > when the completion ring is shared in the SKB case. This is the case
> > > when you share the same netdev and queue id through the
> > > XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
> > > be in xsk_generic_xmit() and call the backpressure mechanism in
> > > xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
> > > specific scenario, a race might occur since the rings are
> > > single-producer single-consumer.
> > >
> > > Fix this by moving the tx_completion_lock from the socket to the pool
> > > as the pool is shared between the sockets that share the completion
> > > ring. (The pool is not shared when this is not the case.) And then
> > > protect the accesses to xskq_prod_reserve() with this lock. The
> > > tx_completion_lock is renamed cq_lock to better reflect that it
> > > protects accesses to the potentially shared completion ring.
> > >
> > > Fixes: 35fcde7f8deb ("xsk: support for Tx")
> > > Fixes: a9744f7ca200 ("xsk: fix potential race in SKB TX completion code")
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
> > > Reported-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> > > ---
> > > include/net/xdp_sock.h | 4 ----
> > > include/net/xsk_buff_pool.h | 5 +++++
> > > net/xdp/xsk.c | 9 ++++++---
> > > net/xdp/xsk_buff_pool.c | 1 +
> > > 4 files changed, 12 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> > > index 4f4e93bf814c..cc17bc957548 100644
> > > --- a/include/net/xdp_sock.h
> > > +++ b/include/net/xdp_sock.h
> > > @@ -58,10 +58,6 @@ struct xdp_sock {
> > >
> > > struct xsk_queue *tx ____cacheline_aligned_in_smp;
> > > struct list_head tx_list;
> > > - /* Mutual exclusion of NAPI TX thread and sendmsg error paths
> > > - * in the SKB destructor callback.
> > > - */
> > > - spinlock_t tx_completion_lock;
> > > /* Protects generic receive. */
> > > spinlock_t rx_lock;
> > >
> > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > > index 01755b838c74..eaa8386dbc63 100644
> > > --- a/include/net/xsk_buff_pool.h
> > > +++ b/include/net/xsk_buff_pool.h
> > > @@ -73,6 +73,11 @@ struct xsk_buff_pool {
> > > bool dma_need_sync;
> > > bool unaligned;
> > > void *addrs;
> > > + /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect:
> > > + * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when
> > > + * sockets share a single cq when the same netdev and queue id is shared.
> > > + */
> > > + spinlock_t cq_lock;
> > > struct xdp_buff_xsk *free_heads[];
> > > };
> > >
> > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > index 62504471fd20..42cb5f94d49e 100644
> > > --- a/net/xdp/xsk.c
> > > +++ b/net/xdp/xsk.c
> > > @@ -364,9 +364,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > > struct xdp_sock *xs = xdp_sk(skb->sk);
> > > unsigned long flags;
> > >
> > > - spin_lock_irqsave(&xs->tx_completion_lock, flags);
> > > + spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > > xskq_prod_submit_addr(xs->pool->cq, addr);
> > > - spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
> > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> > >
> > > sock_wfree(skb);
> > > }
> > > @@ -378,6 +378,7 @@ static int xsk_generic_xmit(struct sock *sk)
> > > bool sent_frame = false;
> > > struct xdp_desc desc;
> > > struct sk_buff *skb;
> > > + unsigned long flags;
> > > int err = 0;
> > >
> > > mutex_lock(&xs->mutex);
> > > @@ -409,10 +410,13 @@ static int xsk_generic_xmit(struct sock *sk)
> > > * if there is space in it. This avoids having to implement
> > > * any buffering in the Tx path.
> > > */
> > > + spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > > if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> > > kfree_skb(skb);
> > > goto out;
> > > }
> > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >
> > Lock/unlock for every packet?
> > Do you have any performance concerns?
>
> I have measured and the performance impact of this is in the noise.
> There is unfortunately already a lot of code being executed per packet
> in this path. On the positive side, once this bug fix trickles down to
> bpf-next, I will submit a rather simple patch to this function that
> improves throughput by 15% for the txpush microbenchmark. This will
> more than compensate for any locking introduced.
Please ignore/drop this patch. I will include this one as patch 1 in a
patch set of 2 that I will submit. The second patch will be a fix to
the reservation problem spotted by Xuan and it needs the locking of
the first patch to be able to work.
Powered by blists - more mailing lists