[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251125200041.1565663-1-jon@nutanix.com>
Date: Tue, 25 Nov 2025 13:00:27 -0700
From: Jon Kohler <jon@...anix.com>
To: netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>,
bpf@...r.kernel.org (open list:XDP (eXpress Data Path):Keyword:(?:\b|_)xdp(?:\b|_))
Cc: Jon Kohler <jon@...anix.com>
Subject: [PATCH net-next v2 0/9] tun: optimize SKB allocation with NAPI cache
Use the per-CPU NAPI cache for SKB allocation in most places, and
leverage bulk allocation for tun_xdp_one since the batch size is known
at submission time. Additionally, utilize napi_build_skb and
napi_consume_skb to further benefit from the NAPI cache. This all
improves efficiency by reducing allocation overhead.
Note: This series does not address the large payload path in
tun_alloc_skb, which spans sock.c and skbuff.c,A separate series will
handle privatizing the allocation code in tun and integrating the NAPI
cache for that path.
Results using basic iperf3 UDP test:
TX guest: taskset -c 2 iperf3 -c rx-ip-here -t 30 -p 5200 -b 0 -u -i 30
RX guest: taskset -c 2 iperf3 -s -p 5200 -D
Bitrate
Before: 6.08 Gbits/sec
After : 6.36 Gbits/sec
However, the basic test doesn't tell the whole story. Looking at a
flamegraph from before and after, less cycles are spent both on RX
vhost thread in the guest-to-guest on a single host case, but also less
cycles in the guest-to-guest case when on separate hosts, as the host
NIC handlers benefit from these NAPI-allocated SKBs (and deferred free)
as well.
Speaking of deferred free, v2 adds exporting deferred free from net
core and using immediately prior in tun_put_user. This not only keeps
the cache as warm as you can get, but also prevents a TX heavy vhost
thread from getting IPI'd like its going out of style. This approach
is similar in concept to what happens from NAPI loop in net_rx_action.
I've also merged this series with a small series about cleaning up
packet drop statistics along the various error paths in tun, as I want
to make sure those all go through kfree_skb_reason(), and we'd have
merge conflicts separating the two. If the maintainers want to take
them separately, happy to break them apart if needed. It is fairly
clean keeping them together otherwise.
Thanks all,
Jon
v2:
- Added drop statistics cleanup series, else merge conflicts abound
- Removed xdp_prog change (Willem)
- Clarified drop scenario in tun_put_user, where it is an extention of
current behavior (Willem comment from v1)
- Export skb_defer_free_flush
- Use deferred skb free to immediately refill cache prior to bulk alloc,
which also prevents IPIs from pounding TX heavy / TX only cores
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20250506145530.2877229-1-jon@nutanix.com/
Jon Kohler (9):
tun: cleanup out label in tun_xdp_one
tun: correct drop statistics in tun_xdp_one
tun: correct drop statistics in tun_put_user
tun: correct drop statistics in tun_get_user
tun: use bulk NAPI cache allocation in tun_xdp_one
tun: use napi_build_skb in __tun_build_skb
tun: use napi_consume_skb() in tun_put_user
net: core: export skb_defer_free_flush
tun: flush deferred skb free list before bulk NAPI cache get
drivers/net/tun.c | 170 +++++++++++++++++++++++++++++------------
include/linux/skbuff.h | 1 +
net/core/dev.c | 3 +-
3 files changed, 126 insertions(+), 48 deletions(-)
--
2.43.0
Powered by blists - more mailing lists