[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <beef3b28-6818-df7b-eaad-8569cac5d79b@gmail.com>
Date: Thu, 18 Nov 2021 09:19:39 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Íñigo Huguet <ihuguet@...hat.com>,
Edward Cree <ecree.xilinx@...il.com>, habetsm.xilinx@...il.com
Cc: netdev@...r.kernel.org, Dinan Gunawardena <dinang@...inx.com>,
Pablo Cascon <pabloc@...inx.com>
Subject: Re: Bad performance in RX with sfc 40G
On 11/18/21 7:14 AM, Íñigo Huguet wrote:
> Hello,
>
> Doing some tests a few weeks ago I noticed a very low performance in
> RX using 40G Solarflare NICs. Doing tests with iperf3 I got more than
> 30Gbps in TX, but just around 15Gbps in RX. Other NICs from other
> vendors could send and receive over 30Gbps.
>
> I was doing the tests with multiple threads in iperf3 (-P 8).
>
> The models used are SFC9140 and SFC9220.
>
> Perf showed that most of the time was being expended in
> `native_queued_spin_lock_slowpath`. Tracing the calls to it with
> bpftrace I got that most of the calls were from __napi_poll > efx_poll
>> efx_fast_push_rx_descriptors > __alloc_pages >
> get_page_from_freelist > ...
>
> Please can you help me investigate the issue? At first sight, it seems
> a not very optimal memory allocation strategy, or maybe a failure in
> pages recycling strategy...
>
> This is the output of bpftrace, the 2 call chains that repeat more
> times, both from sfc
>
> @[
> native_queued_spin_lock_slowpath+1
> _raw_spin_lock+26
> rmqueue_bulk+76
> get_page_from_freelist+2295
> __alloc_pages+214
> efx_fast_push_rx_descriptors+640
> efx_poll+660
> __napi_poll+42
> net_rx_action+547
> __softirqentry_text_start+208
> __irq_exit_rcu+179
> common_interrupt+131
> asm_common_interrupt+30
> cpuidle_enter_state+199
> cpuidle_enter+41
> do_idle+462
> cpu_startup_entry+25
> start_kernel+2465
> secondary_startup_64_no_verify+194
> ]: 2650
> @[
> native_queued_spin_lock_slowpath+1
> _raw_spin_lock+26
> rmqueue_bulk+76
> get_page_from_freelist+2295
> __alloc_pages+214
> efx_fast_push_rx_descriptors+640
> efx_poll+660
> __napi_poll+42
> net_rx_action+547
> __softirqentry_text_start+208
> __irq_exit_rcu+179
> common_interrupt+131
> asm_common_interrupt+30
> cpuidle_enter_state+199
> cpuidle_enter+41
> do_idle+462
> cpu_startup_entry+25
> secondary_startup_64_no_verify+194
> ]: 17119
>
> --
> Íñigo Huguet
>
You could try to :
Make the RX ring buffers bigger (ethtool -G eth0 rx 8192)
and/or
Make sure your tcp socket receive buffer is smaller than number of frames in the ring buffer
echo "4096 131072 2097152" >/proc/sys/net/ipv4/tcp_rmem
You can also try latest net-next, as TCP got something to help this case.
f35f821935d8df76f9c92e2431a225bdff938169 tcp: defer skb freeing after socket lock is released
Powered by blists - more mailing lists