[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <25B14229-DF73-455D-9FF6-2B9F43238C1A@nutanix.com>
Date: Tue, 2 Dec 2025 16:38:00 +0000
From: Jon Kohler <jon@...anix.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Alexei Starovoitov
<ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller"
<davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard
Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>,
"(open list:XDP \\(eXpress Data
Path\\):Keyword:\\(?:\\b|_\\)xdp\\(?:\\b|_\\))" <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v2 0/9] tun: optimize SKB allocation with NAPI
cache
> On Nov 28, 2025, at 10:08 PM, Willem de Bruijn <willemdebruijn.kernel@...il.com> wrote:
>
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> Jon Kohler wrote:
>> Use the per-CPU NAPI cache for SKB allocation in most places, and
>> leverage bulk allocation for tun_xdp_one since the batch size is known
>> at submission time. Additionally, utilize napi_build_skb and
>> napi_consume_skb to further benefit from the NAPI cache. This all
>> improves efficiency by reducing allocation overhead.
>>
>> Note: This series does not address the large payload path in
>> tun_alloc_skb, which spans sock.c and skbuff.c,A separate series will
>> handle privatizing the allocation code in tun and integrating the NAPI
>> cache for that path.
>>
>> Results using basic iperf3 UDP test:
>> TX guest: taskset -c 2 iperf3 -c rx-ip-here -t 30 -p 5200 -b 0 -u -i 30
>> RX guest: taskset -c 2 iperf3 -s -p 5200 -D
>>
>> Bitrate
>> Before: 6.08 Gbits/sec
>> After : 6.36 Gbits/sec
>>
>> However, the basic test doesn't tell the whole story. Looking at a
>> flamegraph from before and after, less cycles are spent both on RX
>> vhost thread in the guest-to-guest on a single host case, but also less
>> cycles in the guest-to-guest case when on separate hosts, as the host
>> NIC handlers benefit from these NAPI-allocated SKBs (and deferred free)
>> as well.
>>
>> Speaking of deferred free, v2 adds exporting deferred free from net
>> core and using immediately prior in tun_put_user. This not only keeps
>> the cache as warm as you can get, but also prevents a TX heavy vhost
>> thread from getting IPI'd like its going out of style. This approach
>> is similar in concept to what happens from NAPI loop in net_rx_action.
>>
>> I've also merged this series with a small series about cleaning up
>> packet drop statistics along the various error paths in tun, as I want
>> to make sure those all go through kfree_skb_reason(), and we'd have
>> merge conflicts separating the two. If the maintainers want to take
>> them separately, happy to break them apart if needed. It is fairly
>> clean keeping them together otherwise.
>
> I think it would be preferable to send the cleanup separately, first.
Sure, will do
> Why would that cause merge conflicts?
Just from a CI perspective, if I sent them separately, guessing CI
would bark about merge conflicts.
Not a problem, let’s nail down cleanup parts and then we can worry
about performance parts
Thx!
Powered by blists - more mailing lists