[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250612154648.1161201-1-mbloch@nvidia.com>
Date: Thu, 12 Jun 2025 18:46:36 +0300
From: Mark Bloch <mbloch@...dia.com>
To: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, "Andrew
Lunn" <andrew+netdev@...n.ch>, Simon Horman <horms@...nel.org>
CC: <saeedm@...dia.com>, <gal@...dia.com>, <leonro@...dia.com>,
<tariqt@...dia.com>, Leon Romanovsky <leon@...nel.org>, "Jesper Dangaard
Brouer" <hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Richard Cochran <richardcochran@...il.com>, Alexei Starovoitov
<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, John Fastabend
<john.fastabend@...il.com>, <netdev@...r.kernel.org>,
<linux-rdma@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<bpf@...r.kernel.org>, Mark Bloch <mbloch@...dia.com>
Subject: [PATCH net-next v5 00/12] net/mlx5e: Add support for devmem and io_uring TCP zero-copy
This series adds support for zerocopy rx TCP with devmem and io_uring
for ConnectX7 NICs and above. For performance reasons and simplicity
HW-GRO will also be turned on when header-data split mode is on.
Performance
===========
Test setup:
* CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA)
* NIC: ConnectX7
* Benchmarking tool: kperf [0]
* Single TCP flow
* Test duration: 60s
With application thread and interrupts pinned to the *same* core:
|------+-----------+----------|
| MTU | epoll | io_uring |
|------+-----------+----------|
| 1500 | 61.6 Gbps | 114 Gbps |
| 4096 | 69.3 Gbps | 151 Gbps |
| 9000 | 67.8 Gbps | 187 Gbps |
|------+-----------+----------|
The CPU usage for io_uring is 95%.
Reproduction steps for io_uring:
server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \
--iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2
server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc
client --src 2001:db8::2 --dst 2001:db8::1 \
--msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2
Patch overview:
================
First, a netmem API for skb_can_coalesce is added to the core to be able
to do skb fragment coalescing on netmems.
The next patches introduce some cleanups in the internal SHAMPO code and
improvements to hw gro capability checks in FW.
A separate page_pool is introduced for headers, to be used only when
the rxq has a memory provider.
Then the driver is converted to use the netmem API and to allow support
for unreadable netmem page pool.
The queue management ops are implemented.
Finally, the tcp-data-split ring parameter is exposed.
Changelog
=========
Changes from v4 [4]:
- Addressed silly return before goto.
- Removed extraneous '\n' and used NL_SET_ERR_MSG_MOD.
- Removed unnecessary netmem_is_net_iov() check.
- Added comment for non HDS packets being dropped when unreadable memory
is used.
- Added page_pool_dev_alloc_netmems() helper.
- Added Reviewed-by tags.
Changes from v3 [3]:
- Dropped ethtool stats for hd_page_pool.
Changes from v2 [2]:
- Added support for netmem TX.
- Changed skb_can_coalesce_netmem() based on Mina's suggestion.
- Reworked tcp_data_split setting to no longer change HW-GRO in
wanted_features.
- Switched to a single page pool when rxq has no memory providers.
Changes from v1 [1]:
- Added support for skb_can_coalesce_netmem().
- Avoid netmem_to_page() casts in the driver.
- Fixed code to abide 80 char limit with some exceptions to avoid
code churn.
References
==========
[0] kperf: git://git.kernel.dk/kperf.git
[1] v1: https://lore.kernel.org/all/20250116215530.158886-1-saeed@kernel.org/
[2] v2: https://lore.kernel.org/all/1747950086-1246773-1-git-send-email-tariqt@nvidia.com/
[3] v3: https://lore.kernel.org/netdev/20250609145833.990793-1-mbloch@nvidia.com/
[4] v4: https://lore.kernel.org/all/20250610150950.1094376-1-mbloch@nvidia.com
Dragos Tatulea (4):
net: Allow const args for of page_to_netmem()
net: Add skb_can_coalesce for netmem
page_pool: Add page_pool_dev_alloc_netmems helper
net/mlx5e: Add TX support for netmems
Saeed Mahameed (8):
net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc
net/mlx5e: SHAMPO: Remove redundant params
net/mlx5e: SHAMPO: Improve hw gro capability checking
net/mlx5e: SHAMPO: Separate pool for headers
net/mlx5e: Convert over to netmem
net/mlx5e: Add support for UNREADABLE netmem page pools
net/mlx5e: Implement queue mgmt ops and single channel swap
net/mlx5e: Support ethtool tcp-data-split settings
drivers/net/ethernet/mellanox/mlx5/core/en.h | 11 +-
.../ethernet/mellanox/mlx5/core/en/params.c | 36 ++-
.../net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 +-
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 33 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 303 +++++++++++++-----
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 138 ++++----
include/linux/skbuff.h | 12 +-
include/net/netmem.h | 2 +-
include/net/page_pool/helpers.h | 7 +
9 files changed, 378 insertions(+), 167 deletions(-)
base-commit: 5d6d67c4cb10a4b4d3ae35758d5eeed6239afdc8
--
2.34.1
Powered by blists - more mailing lists