netdev - Re: [PATCH net-next v2 0/9] tun: optimize SKB allocation with NAPI cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <25B14229-DF73-455D-9FF6-2B9F43238C1A@nutanix.com>
Date: Tue, 2 Dec 2025 16:38:00 +0000
From: Jon Kohler <jon@...anix.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Alexei Starovoitov
	<ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "David S. Miller"
	<davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jesper Dangaard
 Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Stanislav Fomichev <sdf@...ichev.me>,
        "(open list:XDP \\(eXpress Data
 Path\\):Keyword:\\(?:\\b|_\\)xdp\\(?:\\b|_\\))" <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v2 0/9] tun: optimize SKB allocation with NAPI
 cache



> On Nov 28, 2025, at 10:08 PM, Willem de Bruijn <willemdebruijn.kernel@...il.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> Jon Kohler wrote:
>> Use the per-CPU NAPI cache for SKB allocation in most places, and
>> leverage bulk allocation for tun_xdp_one since the batch size is known
>> at submission time. Additionally, utilize napi_build_skb and
>> napi_consume_skb to further benefit from the NAPI cache. This all
>> improves efficiency by reducing allocation overhead. 
>> 
>> Note: This series does not address the large payload path in
>> tun_alloc_skb, which spans sock.c and skbuff.c,A separate series will
>> handle privatizing the allocation code in tun and integrating the NAPI
>> cache for that path.
>> 
>> Results using basic iperf3 UDP test:
>> TX guest: taskset -c 2 iperf3 -c rx-ip-here -t 30 -p 5200 -b 0 -u -i 30
>> RX guest: taskset -c 2 iperf3 -s -p 5200 -D
>> 
>>        Bitrate       
>> Before: 6.08 Gbits/sec
>> After : 6.36 Gbits/sec
>> 
>> However, the basic test doesn't tell the whole story. Looking at a
>> flamegraph from before and after, less cycles are spent both on RX
>> vhost thread in the guest-to-guest on a single host case, but also less
>> cycles in the guest-to-guest case when on separate hosts, as the host
>> NIC handlers benefit from these NAPI-allocated SKBs (and deferred free)
>> as well.
>> 
>> Speaking of deferred free, v2 adds exporting deferred free from net
>> core and using immediately prior in tun_put_user. This not only keeps
>> the cache as warm as you can get, but also prevents a TX heavy vhost
>> thread from getting IPI'd like its going out of style. This approach
>> is similar in concept to what happens from NAPI loop in net_rx_action.
>> 
>> I've also merged this series with a small series about cleaning up
>> packet drop statistics along the various error paths in tun, as I want
>> to make sure those all go through kfree_skb_reason(), and we'd have
>> merge conflicts separating the two. If the maintainers want to take
>> them separately, happy to break them apart if needed. It is fairly
>> clean keeping them together otherwise.
> 
> I think it would be preferable to send the cleanup separately, first.

Sure, will do

> Why would that cause merge conflicts?

Just from a CI perspective, if I sent them separately, guessing CI
would bark about merge conflicts. 

Not a problem, let’s nail down cleanup parts and then we can worry
about performance parts

Thx!