[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoAHRct5w4xpfzHcafpf0ZHiUAofDjLuhxKibwrB3aaZ=g@mail.gmail.com>
Date: Tue, 6 Jan 2026 11:25:02 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: bot+bpf-ci@...nel.org
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me,
ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
john.fastabend@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org,
kernelxing@...cent.com, andrii@...nel.org, martin.lau@...nel.org,
eddyz87@...il.com, yonghong.song@...ux.dev, clm@...a.com,
ihor.solodrai@...ux.dev
Subject: Re: [PATCH bpf-next v4 2/2] xsk: introduce a dedicated local
completion queue for each xsk
On Tue, Jan 6, 2026 at 9:47 AM <bot+bpf-ci@...nel.org> wrote:
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f41e0b480aa4..f80721165fdc 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
>
> [ ... ]
>
> > -static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
> > +static int xsk_cq_reserve_addr_locked(struct xdp_sock *xs, u64 addr)
> > {
> > + struct xsk_buff_pool *pool = xs->pool;
> > + struct local_cq *lcq = xs->lcq;
> > int ret;
> >
> > spin_lock(&pool->cq_cached_prod_lock);
> > ret = xskq_prod_reserve(pool->cq);
> > spin_unlock(&pool->cq_cached_prod_lock);
> > + if (!ret)
> > + lcq->desc[lcq->prod++ & lcq->ring_mask] = addr;
> >
> > return ret;
> > }
>
> [ ... ]
>
> > +/* Record the position of first desc in local cq */
> > +static void xsk_skb_destructor_set_addr(struct sk_buff *skb,
> > + struct xdp_sock *xs)
> > {
> > - skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> > + long val;
> > +
> > + val = ((xs->lcq->prod - 1) & xs->lcq->ring_mask) << XSK_DESTRUCTOR_DESCS_SHIFT;
> > + skb_shinfo(skb)->destructor_arg = (void *)val;
> > }
>
> [ ... ]
>
> > +static void xsk_cq_write_addr(struct sk_buff *skb, u32 desc_processed)
> > +{
> > + struct xsk_buff_pool *pool = xdp_sk(skb->sk)->pool;
> > + u32 idx, pos = xsk_get_start_addr(skb);
> > + struct xdp_sock *xs = xdp_sk(skb->sk);
> > + u64 addr;
> > +
> > + idx = xskq_get_prod(pool->cq) + desc_processed;
> > + addr = xs->lcq->desc[(pos + desc_processed) & xs->lcq->ring_mask];
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > + xskq_prod_write_addr(pool->cq, idx, addr);
> > +}
>
> Can this read stale or corrupted data from lcq->desc[] if skb destructors
> run out of order?
>
> Consider this scenario with a completion queue of 4 entries:
> 1. SKB1 reserves and writes to lcq[0], stores start_pos=0
> 2. SKB2 reserves and writes to lcq[1], stores start_pos=1
> 3. SKB3 reserves and writes to lcq[2], stores start_pos=2
> 4. SKB4 reserves and writes to lcq[3], stores start_pos=3
> 5. Backpressure - cq is full
> 6. Network completes SKB4 first (out of order), destructor reads lcq[3],
> submits to cq, space becomes available
> 7. SKB5 reserves and writes to lcq[0] (wraparound), overwriting SKB1's data
> 8. SKB1's destructor finally runs, reads lcq[0] which now contains SKB5's
> address instead of SKB1's original address
>
> The old code stored addresses in a per-skb xsk_addrs structure, so each skb
> had its own copy that could not be overwritten. The new shared ring buffer
> design appears to be vulnerable to out-of-order completion corrupting the
> address data.
Right, this case makes me rethink if I should continue with this
series as it's apparently becoming more and more complicated than
expected.
I will try to fix it well with minimum changes.
Thanks,
Jason
>
> > +static void xsk_cq_submit_addr_locked(struct sk_buff *skb)
> > {
> > - u32 num_descs = xsk_get_num_desc(skb);
> > - struct xsk_addrs *xsk_addr;
> > - u32 descs_processed = 0;
> > + struct xsk_buff_pool *pool = xdp_sk(skb->sk)->pool;
> > + u8 i, num = xsk_get_num_desc(skb);
> > unsigned long flags;
> > - u32 idx, i;
> >
> > spin_lock_irqsave(&pool->cq_prod_lock, flags);
> > - idx = xskq_get_prod(pool->cq);
> > -
> > - if (unlikely(num_descs > 1)) {
> > - xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
> > -
> > - for (i = 0; i < num_descs; i++) {
> > - xskq_prod_write_addr(pool->cq, idx + descs_processed,
> > - xsk_addr->addrs[i]);
> > - descs_processed++;
> > - }
> > - kmem_cache_free(xsk_tx_generic_cache, xsk_addr);
> > - } else {
> > - xskq_prod_write_addr(pool->cq, idx,
> > - xsk_skb_destructor_get_addr(skb));
> > - descs_processed++;
> > - }
> > - xskq_prod_submit_n(pool->cq, descs_processed);
> > + for (i = 0; i < num; i++)
> > + xsk_cq_write_addr(skb, i);
> > + xskq_prod_submit_n(pool->cq, num);
> > spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
> > }
>
> [ ... ]
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20735004935
Powered by blists - more mailing lists