netdev - Re: [PATCH bpf-next v4 2/2] xsk: introduce a dedicated local completion queue for each xsk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoAHRct5w4xpfzHcafpf0ZHiUAofDjLuhxKibwrB3aaZ=g@mail.gmail.com>
Date: Tue, 6 Jan 2026 11:25:02 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: bot+bpf-ci@...nel.org
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me, 
	ast@...nel.org, daniel@...earbox.net, hawk@...nel.org, 
	john.fastabend@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	kernelxing@...cent.com, andrii@...nel.org, martin.lau@...nel.org, 
	eddyz87@...il.com, yonghong.song@...ux.dev, clm@...a.com, 
	ihor.solodrai@...ux.dev
Subject: Re: [PATCH bpf-next v4 2/2] xsk: introduce a dedicated local
 completion queue for each xsk

On Tue, Jan 6, 2026 at 9:47 AM <bot+bpf-ci@...nel.org> wrote:
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f41e0b480aa4..f80721165fdc 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
>
> [ ... ]
>
> > -static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
> > +static int xsk_cq_reserve_addr_locked(struct xdp_sock *xs, u64 addr)
> >  {
> > +     struct xsk_buff_pool *pool = xs->pool;
> > +     struct local_cq *lcq = xs->lcq;
> >       int ret;
> >
> >       spin_lock(&pool->cq_cached_prod_lock);
> >       ret = xskq_prod_reserve(pool->cq);
> >       spin_unlock(&pool->cq_cached_prod_lock);
> > +     if (!ret)
> > +             lcq->desc[lcq->prod++ & lcq->ring_mask] = addr;
> >
> >       return ret;
> >  }
>
> [ ... ]
>
> > +/* Record the position of first desc in local cq */
> > +static void xsk_skb_destructor_set_addr(struct sk_buff *skb,
> > +                                     struct xdp_sock *xs)
> >  {
> > -     skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> > +     long val;
> > +
> > +     val = ((xs->lcq->prod - 1) & xs->lcq->ring_mask) << XSK_DESTRUCTOR_DESCS_SHIFT;
> > +     skb_shinfo(skb)->destructor_arg = (void *)val;
> >  }
>
> [ ... ]
>
> > +static void xsk_cq_write_addr(struct sk_buff *skb, u32 desc_processed)
> > +{
> > +     struct xsk_buff_pool *pool = xdp_sk(skb->sk)->pool;
> > +     u32 idx, pos = xsk_get_start_addr(skb);
> > +     struct xdp_sock *xs = xdp_sk(skb->sk);
> > +     u64 addr;
> > +
> > +     idx = xskq_get_prod(pool->cq) + desc_processed;
> > +     addr = xs->lcq->desc[(pos + desc_processed) & xs->lcq->ring_mask];
> >                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +     xskq_prod_write_addr(pool->cq, idx, addr);
> > +}
>
> Can this read stale or corrupted data from lcq->desc[] if skb destructors
> run out of order?
>
> Consider this scenario with a completion queue of 4 entries:
>   1. SKB1 reserves and writes to lcq[0], stores start_pos=0
>   2. SKB2 reserves and writes to lcq[1], stores start_pos=1
>   3. SKB3 reserves and writes to lcq[2], stores start_pos=2
>   4. SKB4 reserves and writes to lcq[3], stores start_pos=3
>   5. Backpressure - cq is full
>   6. Network completes SKB4 first (out of order), destructor reads lcq[3],
>      submits to cq, space becomes available
>   7. SKB5 reserves and writes to lcq[0] (wraparound), overwriting SKB1's data
>   8. SKB1's destructor finally runs, reads lcq[0] which now contains SKB5's
>      address instead of SKB1's original address
>
> The old code stored addresses in a per-skb xsk_addrs structure, so each skb
> had its own copy that could not be overwritten. The new shared ring buffer
> design appears to be vulnerable to out-of-order completion corrupting the
> address data.

Right, this case makes me rethink if I should continue with this
series as it's apparently becoming more and more complicated than
expected.

I will try to fix it well with minimum changes.

Thanks,
Jason

>
> > +static void xsk_cq_submit_addr_locked(struct sk_buff *skb)
> >  {
> > -     u32 num_descs = xsk_get_num_desc(skb);
> > -     struct xsk_addrs *xsk_addr;
> > -     u32 descs_processed = 0;
> > +     struct xsk_buff_pool *pool = xdp_sk(skb->sk)->pool;
> > +     u8 i, num = xsk_get_num_desc(skb);
> >       unsigned long flags;
> > -     u32 idx, i;
> >
> >       spin_lock_irqsave(&pool->cq_prod_lock, flags);
> > -     idx = xskq_get_prod(pool->cq);
> > -
> > -     if (unlikely(num_descs > 1)) {
> > -             xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
> > -
> > -             for (i = 0; i < num_descs; i++) {
> > -                     xskq_prod_write_addr(pool->cq, idx + descs_processed,
> > -                                          xsk_addr->addrs[i]);
> > -                     descs_processed++;
> > -             }
> > -             kmem_cache_free(xsk_tx_generic_cache, xsk_addr);
> > -     } else {
> > -             xskq_prod_write_addr(pool->cq, idx,
> > -                                  xsk_skb_destructor_get_addr(skb));
> > -             descs_processed++;
> > -     }
> > -     xskq_prod_submit_n(pool->cq, descs_processed);
> > +     for (i = 0; i < num; i++)
> > +             xsk_cq_write_addr(skb, i);
> > +     xskq_prod_submit_n(pool->cq, num);
> >       spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
> >  }
>
> [ ... ]
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20735004935