[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y50CbIe2efR/JMwz@sbohrer-cf-dell>
Date: Fri, 16 Dec 2022 17:42:36 -0600
From: Shawn Bohrer <sbohrer@...udflare.com>
To: Magnus Karlsson <magnus.karlsson@...il.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, bjorn@...nel.org,
magnus.karlsson@...el.com, kernel-team@...udflare.com
Subject: Re: Possible race with xsk_flush
On Fri, Dec 16, 2022 at 11:05:19AM +0100, Magnus Karlsson wrote:
> To summarize, we are expecting this ordering:
>
> CPU 0 __xsk_rcv_zc()
> CPU 0 __xsk_map_flush()
> CPU 2 __xsk_rcv_zc()
> CPU 2 __xsk_map_flush()
>
> But we are seeing this order:
>
> CPU 0 __xsk_rcv_zc()
> CPU 2 __xsk_rcv_zc()
> CPU 0 __xsk_map_flush()
> CPU 2 __xsk_map_flush()
>
> Here is the veth NAPI poll loop:
>
> static int veth_poll(struct napi_struct *napi, int budget)
> {
> struct veth_rq *rq =
> container_of(napi, struct veth_rq, xdp_napi);
> struct veth_stats stats = {};
> struct veth_xdp_tx_bq bq;
> int done;
>
> bq.count = 0;
>
> xdp_set_return_frame_no_direct();
> done = veth_xdp_rcv(rq, budget, &bq, &stats);
>
> if (done < budget && napi_complete_done(napi, done)) {
> /* Write rx_notify_masked before reading ptr_ring */
> smp_store_mb(rq->rx_notify_masked, false);
> if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
> if (napi_schedule_prep(&rq->xdp_napi)) {
> WRITE_ONCE(rq->rx_notify_masked, true);
> __napi_schedule(&rq->xdp_napi);
> }
> }
> }
>
> if (stats.xdp_tx > 0)
> veth_xdp_flush(rq, &bq);
> if (stats.xdp_redirect > 0)
> xdp_do_flush();
> xdp_clear_return_frame_no_direct();
>
> return done;
> }
>
> Something I have never seen before is that there is
> napi_complete_done() and a __napi_schedule() before xdp_do_flush().
> Let us check if this has something to do with it. So two suggestions
> to be executed separately:
>
> * Put a probe at the __napi_schedule() above and check if it gets
> triggered before this problem
> * Move the "if (stats.xdp_redirect > 0) xdp_do_flush();" to just
> before "if (done < budget && napi_complete_done(napi, done)) {"
I've built a kernel moving xdp_do_flush() before napi_complete_done()
and will leave this running over the weekend. I also spent a while
trying to understand the napi code to see if that reordering seemed
like it might cause the bug. I'll admit I still don't fully undestand
the napi code or how the bug can happen. We'll see if this appears to
fix the issue, but since it is somewhat hard to reproduce I'd love for
someone to be able to explain why it fixes it.
Thanks,
Shawn Bohrer
Powered by blists - more mailing lists