lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBWOUCd8f1Q6BYh+xuKs5=Qgr2oOBb9CLU_6BrasD0vfg@mail.gmail.com>
Date: Fri, 15 Aug 2025 14:44:56 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me, 
	ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com, 
	horms@...nel.org, andrew+netdev@...n.ch, bpf@...r.kernel.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next 2/2] xsk: support generic batch xmit in copy mode

On Tue, Aug 12, 2025 at 10:30 PM Jesper Dangaard Brouer <hawk@...nel.org> wrote:
>
...
>
> But this also requires changing the SKB alloc function used by
> xsk_build_skb(). As a seperate patch, I recommend that you change the
> sock_alloc_send_skb() to instead use build_skb (or build_skb_around).
> I expect this will be a large performance improvement on it's own.
> Can I ask you to benchmark this change before the batch xmit change?
>
> Opinions needed from other maintainers please (I might be wrong!):
> I don't think the socket level accounting done in sock_alloc_send_skb()
> is correct/relevant for AF_XDP/XSK, because the "backpressure mechanism"
> code comment above.

Here I'm bringing back the last test you expected to know :)

I use alloc_skb() to replace sock_alloc_send_skb() and introduce other
minor changes, say, removing sock_wfree() from xsk_destruct_skb(). It
turns out to be a stable 5% performance improvement on i40e driver.
slight improvement on virtio_net. That's good news.

Bad news is that the above logic has bugs like freeing skb in the napi
poll causes accessing skb->sk in xsk_destruct_skb() which triggers a
NULL pointer issue. How did I spot this one? I removed the BQL flow
control and started two xdpsock on different queues, then I saw a
panic[1]... To solve the problem like that, I'm afraid that we still
need to charge a certain length value into sk_wmem_alloc so that
sock_wfree(skb) can be the last one to free the socket finally.

So this socket level accounting mechanism keeps its safety in the above case.

IMHO, we can get rid of the limitation of sk_sndbuf but still use
skb_set_owner_w() that charges the len of skb. If we stick to removing
the whole accounting function, probably we have to adjust the position
of xsk_cq_submit_locked(), but I reckon for now it's not practical...

Any thoughts on this?

[1]
 997 [  133.528449] RIP: 0010:xsk_destruct_skb+0x6a/0x90
 998 [  133.528920] Code: 8b 6c 02 28 48 8b 43 18 4c 8b a0 68 03 00 00
49 8d 9c 24 e8 00 00 00 48 89 df e8 f1 eb 06 00 48 89 c6 49 8b 84 24
88 00 00 00 <48> 8b 50 10 03 2a 48      8b 40 10 48 89 df 89 28 5b 5d
41 5c e9 6e ec
 999 [  133.530526] RSP: 0018:ffffae71c06a0d08 EFLAGS: 00010046
1000 [  133.531005] RAX: 0000000000000000 RBX: ffff9f42c81c49e8 RCX:
00000000000002e7
1001 [  133.531631] RDX: 0000000000000001 RSI: 0000000000000286 RDI:
ffff9f42c81c49e8
1002 [  133.532249] RBP: 0000000000000001 R08: 0000000000000008 R09:
00000000000000001003 [  133.532867] R10: ffffffff978080c0 R11:
ffffae71c06a0ff8 R12: ffff9f42c81c4900
1004 [  133.533491] R13: ffffae71c06a0d88 R14: ffff9f42e0f1f900 R15:
ffff9f42ce850d801005 [  133.534123] FS:  0000000000000000(0000)
GS:ffff9f5227655000(0000) knlGS:00000000000000001006 [  133.534831]
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
1007 [  133.535366] CR2: 0000000000000010 CR3: 000000011c820000 CR4:
00000000003506f0
1008 [  133.536014] Call Trace:
1009 [  133.536313]  <IRQ>
1010 [  133.536583]  skb_release_head_state+0x20/0x90
1011 [  133.537021]  napi_consume_skb+0x42/0x120
1012 [  133.537429]  __free_old_xmit+0x76/0x170 [virtio_net]
1013 [  133.537923]  free_old_xmit+0x53/0xc0 [virtio_net]
1014 [  133.538395]  virtnet_poll+0xed/0x5d0 [virtio_net]
1015 [  133.538867]  ? blake2s_compress+0x52/0xa0
1016 [  133.539286]  __napi_poll+0x28/0x200
1017 [  133.539668]  net_rx_action+0x319/0x400
1018 [  133.540068]  ? sched_clock_cpu+0xb/0x190
1019 [  133.540482]  ? __run_timers+0x1d1/0x260
1020 [  133.540906]  ? __pfx_dl_task_timer+0x10/0x10
1021 [  133.541349]  ? lock_timer_base+0x72/0x90
1022 [  133.541767]  handle_softirqs+0xce/0x2e0
1023 [  133.542178]  __irq_exit_rcu+0xc6/0xf0
1024 [  133.542575]  common_interrupt+0x81/0xa0

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ