lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCN2Lux970eMyXk_SjWsH9M38zsbaJ9o25tJn94DGLMwQ@mail.gmail.com>
Date: Tue, 23 Sep 2025 16:39:33 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, bpf@...r.kernel.org, ast@...nel.org, 
	daniel@...earbox.net, andrii@...nel.org, netdev@...r.kernel.org, 
	magnus.karlsson@...el.com
Subject: Re: [PATCH bpf-next 1/3] xsk: avoid overwriting skb fields for
 multi-buffer traffic

On Tue, Sep 23, 2025 at 1:25 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> On 09/22, Maciej Fijalkowski wrote:
> > We are unnecessarily setting a bunch of skb fields per each processed
> > descriptor, which is redundant for fragmented frames.
> >
> > Let us set these respective members for first fragment only.
> >
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> > ---
> >  net/xdp/xsk.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 72e34bd2d925..72194f0a3fc0 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -758,6 +758,10 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> >                               goto free_err;
> >
> >                       xsk_set_destructor_arg(skb, desc->addr);
> > +                     skb->dev = dev;
> > +                     skb->priority = READ_ONCE(xs->sk.sk_priority);
> > +                     skb->mark = READ_ONCE(xs->sk.sk_mark);
> > +                     skb->destructor = xsk_destruct_skb;
> >               } else {
> >                       int nr_frags = skb_shinfo(skb)->nr_frags;
> >                       struct xsk_addr_node *xsk_addr;
> > @@ -826,14 +830,10 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> >
> >                       if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
> >                               skb->skb_mstamp_ns = meta->request.launch_time;
> > +                     xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta);
> >               }
> >       }
> >
> > -     skb->dev = dev;
> > -     skb->priority = READ_ONCE(xs->sk.sk_priority);
> > -     skb->mark = READ_ONCE(xs->sk.sk_mark);
> > -     skb->destructor = xsk_destruct_skb;
> > -     xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta);
> >       xsk_inc_num_desc(skb);
>
> What about IFF_TX_SKB_NO_LINEAR case? I'm not super familiar with
> it, but I don't see priority/mark being set over there after this change.

Agreed. NO_LINEAR is used for VM. Aside from what you mentioned, with
this adjustment the initialization of skb is not finished here, which
leads to 1) failure in __dev_direct_xmit() due to unknown skb->dev, 2)
losing the chance to set its own destructor, etc. Those fields work
for either linear drivers (like virtio_net) or physical drivers.

Testing on my VM, I saw the following splat appearing on the screen as
I pointed out earlier:
[   91.389269] RIP: 0010:__dev_direct_xmit+0x32/0x1e0
[   91.389659] Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 83 ec
18 89 75 c4 48 8b 5f 10 65 48 8b 05 d0 f3 b7 01 48 89 45 d0 31 c0 c6
45 cf 00 <48> 8b 83 a8 00 00 00 a8 01 0f 84 90 01 00 00 48 8b 83 a8 00
00 00
[   91.391095] RSP: 0018:ffffc9000482bce8 EFLAGS: 00010246
[   91.391538] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[   91.392107] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888120440700
[   91.392663] RBP: ffffc9000482bd28 R08: 000000000000003c R09: 0000000000000001
[   91.393230] R10: 0000000000001000 R11: 0000000000000000 R12: ffff888120440700
[   91.393800] R13: ffff888101ed4e00 R14: ffff888120440700 R15: ffff888123bf2c00
[   91.394360] FS:  00007f2094609540(0000) GS:ffff88907bec2000(0000)
knlGS:0000000000000000
[   91.394992] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   91.395447] CR2: 00000000000000a8 CR3: 0000000113d0d000 CR4: 00000000003506f0
[   91.396015] Call Trace:
[   91.396226]  <TASK>
[   91.396415]  __xsk_generic_xmit+0x315/0x3c0
[   91.396767]  __xsk_sendmsg.constprop.0.isra.0+0x16f/0x1a0
[   91.397208]  xsk_sendmsg+0x25/0x40
[   91.397496]  __sys_sendto+0x210/0x220
[   91.397811]  ? srso_return_thunk+0x5/0x5f
[   91.398343]  ? _sched_setscheduler.isra.0+0x7b/0xb0
[   91.398935]  __x64_sys_sendto+0x24/0x30
[   91.399433]  x64_sys_call+0x8d4/0x1fc0
[   91.399937]  do_syscall_64+0x5d/0x2e0
[   91.400437]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ