lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUA+L_PishcamVi9qOLb595mpB-xKgibRcnv9JSfXeoE_g@mail.gmail.com>
Date: Thu, 6 Nov 2025 22:46:58 -0800
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/3] net: fix napi_consume_skb() with alien skbs

On Thu, Nov 6, 2025 at 12:29 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> There is a lack of NUMA awareness and more generally lack
> of slab caches affinity on TX completion path.
>
> Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
> in per-cpu caches so that they can be recycled in RX path.
>
> Only use this if the skb was allocated on the same cpu,
> otherwise use skb_attempt_defer_free() so that the skb
> is freed on the original cpu.
>
> This removes contention on SLUB spinlocks and data structures.
>
> After this patch, I get ~50% improvement for an UDP tx workload
> on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).
>
> 80 Mpps -> 120 Mpps.
>
> Profiling one of the 32 cpus servicing NIC interrupts :
>
> Before:
>
> mpstat -P 511 1 1
>
> Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> Average:     511    0.00    0.00    0.00    0.00    0.00   98.00    0.00    0.00    0.00    2.00
>
>     31.01%  ksoftirqd/511    [kernel.kallsyms]  [k] queued_spin_lock_slowpath
>     12.45%  swapper          [kernel.kallsyms]  [k] queued_spin_lock_slowpath
>      5.60%  ksoftirqd/511    [kernel.kallsyms]  [k] __slab_free
>      3.31%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
>      3.27%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
>      2.95%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_start
>      2.52%  ksoftirqd/511    [kernel.kallsyms]  [k] fq_dequeue
>      2.32%  ksoftirqd/511    [kernel.kallsyms]  [k] read_tsc
>      2.25%  ksoftirqd/511    [kernel.kallsyms]  [k] build_detached_freelist
>      2.15%  ksoftirqd/511    [kernel.kallsyms]  [k] kmem_cache_free
>      2.11%  swapper          [kernel.kallsyms]  [k] __slab_free
>      2.06%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_features_check
>      2.01%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
>      1.97%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_release_data
>      1.52%  ksoftirqd/511    [kernel.kallsyms]  [k] sock_wfree
>      1.34%  swapper          [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
>      1.23%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
>      1.15%  ksoftirqd/511    [kernel.kallsyms]  [k] dma_unmap_page_attrs
>      1.11%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_start
>      1.03%  swapper          [kernel.kallsyms]  [k] fq_dequeue
>      0.94%  swapper          [kernel.kallsyms]  [k] kmem_cache_free
>      0.93%  swapper          [kernel.kallsyms]  [k] read_tsc
>      0.81%  ksoftirqd/511    [kernel.kallsyms]  [k] napi_consume_skb
>      0.79%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
>      0.77%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_free_head
>      0.76%  swapper          [kernel.kallsyms]  [k] idpf_features_check
>      0.72%  swapper          [kernel.kallsyms]  [k] skb_release_data
>      0.69%  swapper          [kernel.kallsyms]  [k] build_detached_freelist
>      0.58%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_release_head_state
>      0.56%  ksoftirqd/511    [kernel.kallsyms]  [k] __put_partials
>      0.55%  ksoftirqd/511    [kernel.kallsyms]  [k] kmem_cache_free_bulk
>      0.48%  swapper          [kernel.kallsyms]  [k] sock_wfree
>
> After:
>
> mpstat -P 511 1 1
>
> Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> Average:     511    0.00    0.00    0.00    0.00    0.00   51.49    0.00    0.00    0.00   48.51
>
>     19.10%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
>     13.86%  swapper          [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
>     10.80%  swapper          [kernel.kallsyms]  [k] skb_attempt_defer_free
>     10.57%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
>      7.18%  swapper          [kernel.kallsyms]  [k] queued_spin_lock_slowpath
>      6.69%  swapper          [kernel.kallsyms]  [k] sock_wfree
>      5.55%  swapper          [kernel.kallsyms]  [k] dma_unmap_page_attrs
>      3.10%  swapper          [kernel.kallsyms]  [k] fq_dequeue
>      3.00%  swapper          [kernel.kallsyms]  [k] skb_release_head_state
>      2.73%  swapper          [kernel.kallsyms]  [k] read_tsc
>      2.48%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_start
>      1.20%  swapper          [kernel.kallsyms]  [k] idpf_features_check
>      1.13%  swapper          [kernel.kallsyms]  [k] napi_consume_skb
>      0.93%  swapper          [kernel.kallsyms]  [k] idpf_vport_splitq_napi_poll
>      0.64%  swapper          [kernel.kallsyms]  [k] native_send_call_func_single_ipi
>      0.60%  swapper          [kernel.kallsyms]  [k] acpi_processor_ffh_cstate_enter
>      0.53%  swapper          [kernel.kallsyms]  [k] io_idle
>      0.43%  swapper          [kernel.kallsyms]  [k] netif_skb_features
>      0.41%  swapper          [kernel.kallsyms]  [k] __direct_call_cpuidle_state_enter2
>      0.40%  swapper          [kernel.kallsyms]  [k] native_irq_return_iret
>      0.40%  swapper          [kernel.kallsyms]  [k] idpf_tx_buf_hw_update
>      0.36%  swapper          [kernel.kallsyms]  [k] sched_clock_noinstr
>      0.34%  swapper          [kernel.kallsyms]  [k] handle_softirqs
>      0.32%  swapper          [kernel.kallsyms]  [k] net_rx_action
>      0.32%  swapper          [kernel.kallsyms]  [k] dql_completed
>      0.32%  swapper          [kernel.kallsyms]  [k] validate_xmit_skb
>      0.31%  swapper          [kernel.kallsyms]  [k] skb_network_protocol
>      0.29%  swapper          [kernel.kallsyms]  [k] skb_csum_hwoffload_help
>      0.29%  swapper          [kernel.kallsyms]  [k] x2apic_send_IPI
>      0.28%  swapper          [kernel.kallsyms]  [k] ktime_get
>      0.24%  swapper          [kernel.kallsyms]  [k] __qdisc_run
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>

Big improvement again!

Reviewed-by: Kuniyuki Iwashima <kuniyu@...gle.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ