[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com>
Date: Thu, 23 Oct 2025 13:58:19 -0700
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Kuniyuki Iwashima <kuniyu@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, Neal Cardwell <ncardwell@...gle.com>,
David Ahern <dsahern@...nel.org>, Mina Almasry <almasrymina@...gle.com>
Cc: Stanislav Fomichev <sdf@...ichev.me>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Bobby Eshleman <bobbyeshleman@...a.com>
Subject: [PATCH net-next v5 0/4] net: devmem: improve cpu cost of RX token
management
This series improves the CPU cost of RX token management by replacing
the xarray allocator with an niov array and a uref field in niov.
Improvement is ~13% cpu util per RX user thread.
Using kperf, the following results were observed:
Before:
Average RX worker idle %: 13.13, flows 4, test runs 11
After:
Average RX worker idle %: 26.32, flows 4, test runs 11
Two other approaches were tested, but with no improvement. Namely, 1)
using a hashmap for tokens and 2) keeping an xarray of atomic counters
but using RCU so that the hotpath could be mostly lockless. Neither of
these approaches proved better than the simple array in terms of CPU.
The sysfs /proc/sys/net/core/devmem_autorelease is added to opt-out of
the optimization, but give users the performance gain by default.
Note that prior revs reported only a 5% gain. This lower gain was
measured with cpu frequency boosting (unknowingly) disabled. A
consistent ~13% is measured for both kperf and nccl workloads with cpu
frequency boosting on.
Signed-off-by: Bobby Eshleman <bobbyeshleman@...a.com>
---
Changes in v5:
- add sysctl to opt-out of performance benefit, back to old token release
- Link to v4: https://lore.kernel.org/all/20250926-scratch-bobbyeshleman-devmem-tcp-token-upstream-v4-0-39156563c3ea@meta.com
Changes in v4:
- rebase to net-next
- Link to v3: https://lore.kernel.org/r/20250926-scratch-bobbyeshleman-devmem-tcp-token-upstream-v3-0-084b46bda88f@meta.com
Changes in v3:
- make urefs per-binding instead of per-socket, reducing memory
footprint
- fallback to cleaning up references in dmabuf unbind if socket
leaked tokens
- drop ethtool patch
- Link to v2: https://lore.kernel.org/r/20250911-scratch-bobbyeshleman-devmem-tcp-token-upstream-v2-0-c80d735bd453@meta.com
Changes in v2:
- net: ethtool: prevent user from breaking devmem single-binding rule
(Mina)
- pre-assign niovs in binding->vec for RX case (Mina)
- remove WARNs on invalid user input (Mina)
- remove extraneous binding ref get (Mina)
- remove WARN for changed binding (Mina)
- always use GFP_ZERO for binding->vec (Mina)
- fix length of alloc for urefs
- use atomic_set(, 0) to initialize sk_user_frags.urefs
- Link to v1: https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-upstream-v1-0-d946169b5550@meta.com
---
Bobby Eshleman (4):
net: devmem: rename tx_vec to vec in dmabuf binding
net: devmem: refactor sock_devmem_dontneed for autorelease split
net: devmem: use niov array for token management
net: add per-netns sysctl for devmem autorelease
include/net/netmem.h | 1 +
include/net/netns/core.h | 1 +
include/net/sock.h | 8 +++-
net/core/devmem.c | 57 +++++++++++++++-------
net/core/devmem.h | 13 ++++-
net/core/net_namespace.c | 1 +
net/core/sock.c | 115 +++++++++++++++++++++++++++++++++++++--------
net/core/sysctl_net_core.c | 9 ++++
net/ipv4/tcp.c | 69 ++++++++++++++++++++-------
net/ipv4/tcp_ipv4.c | 12 +++--
net/ipv4/tcp_minisocks.c | 3 +-
11 files changed, 229 insertions(+), 60 deletions(-)
---
base-commit: 61b7ade9ba8c3b16867e25411b5f7cf1abe35879
change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503
Best regards,
--
Bobby Eshleman <bobbyeshleman@...a.com>
Powered by blists - more mailing lists