[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCN2Lux970eMyXk_SjWsH9M38zsbaJ9o25tJn94DGLMwQ@mail.gmail.com>
Date: Tue, 23 Sep 2025 16:39:33 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, bpf@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, netdev@...r.kernel.org,
magnus.karlsson@...el.com
Subject: Re: [PATCH bpf-next 1/3] xsk: avoid overwriting skb fields for
multi-buffer traffic
On Tue, Sep 23, 2025 at 1:25 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> On 09/22, Maciej Fijalkowski wrote:
> > We are unnecessarily setting a bunch of skb fields per each processed
> > descriptor, which is redundant for fragmented frames.
> >
> > Let us set these respective members for first fragment only.
> >
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> > ---
> > net/xdp/xsk.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 72e34bd2d925..72194f0a3fc0 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -758,6 +758,10 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > goto free_err;
> >
> > xsk_set_destructor_arg(skb, desc->addr);
> > + skb->dev = dev;
> > + skb->priority = READ_ONCE(xs->sk.sk_priority);
> > + skb->mark = READ_ONCE(xs->sk.sk_mark);
> > + skb->destructor = xsk_destruct_skb;
> > } else {
> > int nr_frags = skb_shinfo(skb)->nr_frags;
> > struct xsk_addr_node *xsk_addr;
> > @@ -826,14 +830,10 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> >
> > if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
> > skb->skb_mstamp_ns = meta->request.launch_time;
> > + xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta);
> > }
> > }
> >
> > - skb->dev = dev;
> > - skb->priority = READ_ONCE(xs->sk.sk_priority);
> > - skb->mark = READ_ONCE(xs->sk.sk_mark);
> > - skb->destructor = xsk_destruct_skb;
> > - xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta);
> > xsk_inc_num_desc(skb);
>
> What about IFF_TX_SKB_NO_LINEAR case? I'm not super familiar with
> it, but I don't see priority/mark being set over there after this change.
Agreed. NO_LINEAR is used for VM. Aside from what you mentioned, with
this adjustment the initialization of skb is not finished here, which
leads to 1) failure in __dev_direct_xmit() due to unknown skb->dev, 2)
losing the chance to set its own destructor, etc. Those fields work
for either linear drivers (like virtio_net) or physical drivers.
Testing on my VM, I saw the following splat appearing on the screen as
I pointed out earlier:
[ 91.389269] RIP: 0010:__dev_direct_xmit+0x32/0x1e0
[ 91.389659] Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 83 ec
18 89 75 c4 48 8b 5f 10 65 48 8b 05 d0 f3 b7 01 48 89 45 d0 31 c0 c6
45 cf 00 <48> 8b 83 a8 00 00 00 a8 01 0f 84 90 01 00 00 48 8b 83 a8 00
00 00
[ 91.391095] RSP: 0018:ffffc9000482bce8 EFLAGS: 00010246
[ 91.391538] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 91.392107] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888120440700
[ 91.392663] RBP: ffffc9000482bd28 R08: 000000000000003c R09: 0000000000000001
[ 91.393230] R10: 0000000000001000 R11: 0000000000000000 R12: ffff888120440700
[ 91.393800] R13: ffff888101ed4e00 R14: ffff888120440700 R15: ffff888123bf2c00
[ 91.394360] FS: 00007f2094609540(0000) GS:ffff88907bec2000(0000)
knlGS:0000000000000000
[ 91.394992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 91.395447] CR2: 00000000000000a8 CR3: 0000000113d0d000 CR4: 00000000003506f0
[ 91.396015] Call Trace:
[ 91.396226] <TASK>
[ 91.396415] __xsk_generic_xmit+0x315/0x3c0
[ 91.396767] __xsk_sendmsg.constprop.0.isra.0+0x16f/0x1a0
[ 91.397208] xsk_sendmsg+0x25/0x40
[ 91.397496] __sys_sendto+0x210/0x220
[ 91.397811] ? srso_return_thunk+0x5/0x5f
[ 91.398343] ? _sched_setscheduler.isra.0+0x7b/0xb0
[ 91.398935] __x64_sys_sendto+0x24/0x30
[ 91.399433] x64_sys_call+0x8d4/0x1fc0
[ 91.399937] do_syscall_64+0x5d/0x2e0
[ 91.400437] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Thanks,
Jason
Powered by blists - more mailing lists