[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251021131209.41491-1-kerneljasonxing@gmail.com>
Date: Tue, 21 Oct 2025 21:12:00 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
bjorn@...nel.org,
magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com,
jonathan.lemon@...il.com,
sdf@...ichev.me,
ast@...nel.org,
daniel@...earbox.net,
hawk@...nel.org,
john.fastabend@...il.com,
joe@...a.to,
willemdebruijn.kernel@...il.com
Cc: bpf@...r.kernel.org,
netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: [PATCH net-next v3 0/9] xsk: batch xmit in copy mode
From: Jason Xing <kernelxing@...cent.com>
This series is focused on the performance improvement in copy mode. As
observed in the physical servers, there are much room left to ramp up
the transmission for copy mode, compared to zerocopy mode.
Even though we can apply zerocopy to achieve a much better performance,
some limitations are still there especially for virtio and veth cases.
In the real world, many hosts still don't implement and support zerocopy
mode for VMs, so copy mode is the only way we can resort to.
Zerocopy has a good function name xskq_cons_read_desc_batch() which
reads descriptors in batch and then sends them out at a time, rather
than just read and send the descriptor one by one in a loop. Similar
batch ideas can be seen from classic mechanisms like GSO/GRO which also
try to handle as many packets as they can at one time. So the motivation
and idea of the series actually originated from them.
Looking back to the initial design and implementation of AF_XDP, it's
not hard to find the big difference it made is to speed up the
transmission when zerocopy mode is enabled. So the conclusion is that
zerocopy mode of AF_XDP outperforms AF_PACKET that still uses copy mode.
As to the whole logic of copy mode for both of them, they looks quite
similar, especially when application using AF_PACKET sets
PACKET_QDISC_BYPASS option. Digging into the details of AF_PACKET, we
can find the implementation is comparatively heavy which can also be
proved by the real test as shown below. The numbers of AF_PACKET test
are a little bit lower.
At the current moment, I consider copy mode of AF_XDP as a half bypass
mechanism to some extent in comparison with the well known bypass
mechanism like DPDK. To avoid much consumption in kernel as much as
possible, then the batch xmit is proposed to aggregate descriptors in a
certain small group and then read/allocate/build/send them in individual
loops.
Applications are allowed to use setsockopt to turn on this feature.
Since memory allocation can be time consuming and heavy due to lack of
memory, it might not be that good to hold one descriptor for too long,
which brings high latency for one skb. That's the reason why the feature
is not set as default.
Experiments numbers:
1) Tested on ixgbe at 10Gb/sec.
copy mode: 1,861,347 pps (baseline)
batch mode: 2,344,052 pps (+25.9%)
xmit.more: 2,711,077 pps (+45.6%)
zc mode: 13,333,593 pps (+616.3%)
AF_PACKET: 1,375,808 pps (-26.0%)
2) Tested on i40e at 10Gb/sec.
copy mode: 1,813,071 pps (baseline)
xmit.more: 3,044,596 pps (67.9%)
zc mode: 14,880,841 pps (720.7%)
AF_PACKET: 1,553,856 pps (-14.0%)
[2]: taskset -c 1 ./xdpsock -i eth1 -t -S -s 64
---
v3
Link: https://lore.kernel.org/all/20250825135342.53110-1-kerneljasonxing@gmail.com/
1. I retested and got different test numbers. Previous test is not that
right because my env has two NUMA nodes and only the first one has a
faster speed.
2. To achieve a stable performance result, the development and
evaluation are also finished in physical servers just like the numbers
that I share.
3. I didn't use pool->tx_descs because sockets can share the same umem
pool.
3. Use skb list to chain the allocated and built skbs to send.
5. Add AF_PACKET test numbers.
V2
Link: https://lore.kernel.org/all/20250811131236.56206-1-kerneljasonxing@gmail.com/
1. add xmit.more sub-feature (Jesper)
2. add kmem_cache_alloc_bulk (Jesper and Maciej)
Jason Xing (9):
xsk: introduce XDP_GENERIC_XMIT_BATCH setsockopt
xsk: extend xsk_build_skb() to support passing an already allocated
skb
xsk: add xsk_alloc_batch_skb() to build skbs in batch
xsk: add direct xmit in batch function
xsk: rename nb_pkts to nb_descs in xsk_tx_peek_release_desc_batch
xsk: extend xskq_cons_read_desc_batch to count nb_pkts
xsk: support batch xmit main logic
xsk: support generic batch xmit in copy mode
xsk: support dynamic xmit.more control for batch xmit
Documentation/networking/af_xdp.rst | 17 +++
include/net/xdp_sock.h | 14 ++
include/uapi/linux/if_xdp.h | 1 +
net/core/dev.c | 26 ++++
net/core/skbuff.c | 101 +++++++++++++
net/xdp/xsk.c | 223 ++++++++++++++++++++++++----
net/xdp/xsk_queue.h | 5 +-
tools/include/uapi/linux/if_xdp.h | 1 +
8 files changed, 359 insertions(+), 29 deletions(-)
--
2.41.3
Powered by blists - more mailing lists