[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ikfmnl15.fsf@toke.dk>
Date: Fri, 07 Nov 2025 12:23:50 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>, "David S . Miller"
<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, Kuniyuki Iwashima <kuniyu@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org,
eric.dumazet@...il.com, Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next 2/3] net: fix napi_consume_skb() with alien skbs
Eric Dumazet <edumazet@...gle.com> writes:
> There is a lack of NUMA awareness and more generally lack
> of slab caches affinity on TX completion path.
>
> Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
> in per-cpu caches so that they can be recycled in RX path.
>
> Only use this if the skb was allocated on the same cpu,
> otherwise use skb_attempt_defer_free() so that the skb
> is freed on the original cpu.
>
> This removes contention on SLUB spinlocks and data structures.
>
> After this patch, I get ~50% improvement for an UDP tx workload
> on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).
>
> 80 Mpps -> 120 Mpps.
>
> Profiling one of the 32 cpus servicing NIC interrupts :
>
> Before:
>
> mpstat -P 511 1 1
>
> Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> Average: 511 0.00 0.00 0.00 0.00 0.00 98.00 0.00 0.00 0.00 2.00
>
> 31.01% ksoftirqd/511 [kernel.kallsyms] [k] queued_spin_lock_slowpath
> 12.45% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
> 5.60% ksoftirqd/511 [kernel.kallsyms] [k] __slab_free
> 3.31% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
> 3.27% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
> 2.95% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_start
> 2.52% ksoftirqd/511 [kernel.kallsyms] [k] fq_dequeue
> 2.32% ksoftirqd/511 [kernel.kallsyms] [k] read_tsc
> 2.25% ksoftirqd/511 [kernel.kallsyms] [k] build_detached_freelist
> 2.15% ksoftirqd/511 [kernel.kallsyms] [k] kmem_cache_free
> 2.11% swapper [kernel.kallsyms] [k] __slab_free
> 2.06% ksoftirqd/511 [kernel.kallsyms] [k] idpf_features_check
> 2.01% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
> 1.97% ksoftirqd/511 [kernel.kallsyms] [k] skb_release_data
> 1.52% ksoftirqd/511 [kernel.kallsyms] [k] sock_wfree
> 1.34% swapper [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
> 1.23% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
> 1.15% ksoftirqd/511 [kernel.kallsyms] [k] dma_unmap_page_attrs
> 1.11% swapper [kernel.kallsyms] [k] idpf_tx_splitq_start
> 1.03% swapper [kernel.kallsyms] [k] fq_dequeue
> 0.94% swapper [kernel.kallsyms] [k] kmem_cache_free
> 0.93% swapper [kernel.kallsyms] [k] read_tsc
> 0.81% ksoftirqd/511 [kernel.kallsyms] [k] napi_consume_skb
> 0.79% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
> 0.77% ksoftirqd/511 [kernel.kallsyms] [k] skb_free_head
> 0.76% swapper [kernel.kallsyms] [k] idpf_features_check
> 0.72% swapper [kernel.kallsyms] [k] skb_release_data
> 0.69% swapper [kernel.kallsyms] [k] build_detached_freelist
> 0.58% ksoftirqd/511 [kernel.kallsyms] [k] skb_release_head_state
> 0.56% ksoftirqd/511 [kernel.kallsyms] [k] __put_partials
> 0.55% ksoftirqd/511 [kernel.kallsyms] [k] kmem_cache_free_bulk
> 0.48% swapper [kernel.kallsyms] [k] sock_wfree
>
> After:
>
> mpstat -P 511 1 1
>
> Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> Average: 511 0.00 0.00 0.00 0.00 0.00 51.49 0.00 0.00 0.00 48.51
>
> 19.10% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
> 13.86% swapper [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
> 10.80% swapper [kernel.kallsyms] [k] skb_attempt_defer_free
> 10.57% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
> 7.18% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
> 6.69% swapper [kernel.kallsyms] [k] sock_wfree
> 5.55% swapper [kernel.kallsyms] [k] dma_unmap_page_attrs
> 3.10% swapper [kernel.kallsyms] [k] fq_dequeue
> 3.00% swapper [kernel.kallsyms] [k] skb_release_head_state
> 2.73% swapper [kernel.kallsyms] [k] read_tsc
> 2.48% swapper [kernel.kallsyms] [k] idpf_tx_splitq_start
> 1.20% swapper [kernel.kallsyms] [k] idpf_features_check
> 1.13% swapper [kernel.kallsyms] [k] napi_consume_skb
> 0.93% swapper [kernel.kallsyms] [k] idpf_vport_splitq_napi_poll
> 0.64% swapper [kernel.kallsyms] [k] native_send_call_func_single_ipi
> 0.60% swapper [kernel.kallsyms] [k] acpi_processor_ffh_cstate_enter
> 0.53% swapper [kernel.kallsyms] [k] io_idle
> 0.43% swapper [kernel.kallsyms] [k] netif_skb_features
> 0.41% swapper [kernel.kallsyms] [k] __direct_call_cpuidle_state_enter2
> 0.40% swapper [kernel.kallsyms] [k] native_irq_return_iret
> 0.40% swapper [kernel.kallsyms] [k] idpf_tx_buf_hw_update
> 0.36% swapper [kernel.kallsyms] [k] sched_clock_noinstr
> 0.34% swapper [kernel.kallsyms] [k] handle_softirqs
> 0.32% swapper [kernel.kallsyms] [k] net_rx_action
> 0.32% swapper [kernel.kallsyms] [k] dql_completed
> 0.32% swapper [kernel.kallsyms] [k] validate_xmit_skb
> 0.31% swapper [kernel.kallsyms] [k] skb_network_protocol
> 0.29% swapper [kernel.kallsyms] [k] skb_csum_hwoffload_help
> 0.29% swapper [kernel.kallsyms] [k] x2apic_send_IPI
> 0.28% swapper [kernel.kallsyms] [k] ktime_get
> 0.24% swapper [kernel.kallsyms] [k] __qdisc_run
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Impressive!
Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>
Powered by blists - more mailing lists