[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <daab8b76-edec-d6af-0ef4-91dd4ae2f8e8@gmail.com>
Date: Fri, 5 Apr 2019 09:11:23 +0200
From: Rafał Miłecki <zajec5@...il.com>
To: Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
Cc: Toshiaki Makita <toshiaki.makita1@...il.com>,
netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Stefano Brivio <sbrivio@...hat.com>,
Sabrina Dubroca <sd@...asysnail.net>,
David Ahern <dsahern@...il.com>, Felix Fietkau <nbd@....name>,
Jo-Philipp Wich <jo@...n.io>,
Koen Vandeputte <koen.vandeputte@...ntric.com>
Subject: Re: NAT performance regression caused by vlan GRO support
On 05.04.2019 07:48, Rafał Miłecki wrote:
> On 05.04.2019 06:26, Toshiaki Makita wrote:
>> My test results:
>>
>> Receiving packets from eth0.10, forwarding them to eth0.20 and applying
>> MASQUERADE on eth0.20, using i40e 25G NIC on kernel 4.20.13.
>> Disabled rxvlan by ethtool -K to exercise vlan_gro_receive().
>> Measured TCP throughput by netperf.
>>
>> GRO on : 17 Gbps
>> GRO off: 5 Gbps
>>
>> So I failed to reproduce your problem.
>
> :( Thanks for trying & checking that!
>
>
>> Would you check the CPU usage by "mpstat -P ALL" or similar (like "sar
>> -u ALL -P ALL") to check if the traffic is able to consume 100% CPU on
>> your machine?
>
> 1) ethtool -K eth0 gro on + iperf running (577 Mb/s)
> root@...nWrt:/# mpstat -P ALL 10 3
> Linux 5.1.0-rc3+ (OpenWrt) 03/27/19 _armv7l_ (2 CPU)
>
> 16:33:40 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:33:50 all 0.00 0.00 0.00 0.00 0.00 58.79 0.00 0.00 41.21
> 16:33:50 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
> 16:33:50 1 0.00 0.00 0.00 0.00 0.00 17.58 0.00 0.00 82.42
>
> 16:33:50 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:34:00 all 0.00 0.00 0.05 0.00 0.00 59.44 0.00 0.00 40.51
> 16:34:00 0 0.00 0.00 0.10 0.00 0.00 99.90 0.00 0.00 0.00
> 16:34:00 1 0.00 0.00 0.00 0.00 0.00 18.98 0.00 0.00 81.02
>
> 16:34:00 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:34:10 all 0.00 0.00 0.00 0.00 0.00 59.59 0.00 0.00 40.41
> 16:34:10 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
> 16:34:10 1 0.00 0.00 0.00 0.00 0.00 19.18 0.00 0.00 80.82
>
> Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> Average: all 0.00 0.00 0.02 0.00 0.00 59.27 0.00 0.00 40.71
> Average: 0 0.00 0.00 0.03 0.00 0.00 99.97 0.00 0.00 0.00
> Average: 1 0.00 0.00 0.00 0.00 0.00 18.58 0.00 0.00 81.42
>
>
> 2) ethtool -K eth0 gro off + iperf running (941 Mb/s)
> root@...nWrt:/# mpstat -P ALL 10 3
> Linux 5.1.0-rc3+ (OpenWrt) 03/27/19 _armv7l_ (2 CPU)
>
> 16:34:39 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:34:49 all 0.00 0.00 0.05 0.00 0.00 86.91 0.00 0.00 13.04
> 16:34:49 0 0.00 0.00 0.10 0.00 0.00 78.22 0.00 0.00 21.68
> 16:34:49 1 0.00 0.00 0.00 0.00 0.00 95.60 0.00 0.00 4.40
>
> 16:34:49 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:34:59 all 0.00 0.00 0.10 0.00 0.00 87.06 0.00 0.00 12.84
> 16:34:59 0 0.00 0.00 0.20 0.00 0.00 79.72 0.00 0.00 20.08
> 16:34:59 1 0.00 0.00 0.00 0.00 0.00 94.41 0.00 0.00 5.59
>
> 16:34:59 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:35:09 all 0.00 0.00 0.05 0.00 0.00 85.71 0.00 0.00 14.24
> 16:35:09 0 0.00 0.00 0.10 0.00 0.00 79.42 0.00 0.00 20.48
> 16:35:09 1 0.00 0.00 0.00 0.00 0.00 92.01 0.00 0.00 7.99
>
> Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> Average: all 0.00 0.00 0.07 0.00 0.00 86.56 0.00 0.00 13.37
> Average: 0 0.00 0.00 0.13 0.00 0.00 79.12 0.00 0.00 20.75
> Average: 1 0.00 0.00 0.00 0.00 0.00 94.01 0.00 0.00 5.99
>
>
> 3) System idle (no iperf)
> root@...nWrt:/# mpstat -P ALL 10 1
> Linux 5.1.0-rc3+ (OpenWrt) 03/27/19 _armv7l_ (2 CPU)
>
> 16:35:31 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 16:35:41 all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> 16:35:41 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> 16:35:41 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
>
> Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> Average: all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> Average: 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
>
>
>> If CPU is 100%, perf may help us analyze your problem. If it's
>> available, try running below while testing:
>> # perf record -a -g -- sleep 5
>>
>> And then run this after testing:
>> # perf report --no-child
>
> I can see my CPU 0 is fully loaded when using "gro on". I'll try perf now.
I guess its GRO + csum_partial() to be blamed for this performance drop.
Maybe csum_partial() is very fast on your powerful machine and few extra calls
don't make a difference? I can imagine it affecting much slower home router with
ARM cores.
1) ethtool -K eth0 gro on
Samples: 34K of event 'cycles', Event count (approx.): 10041345370
Overhead Command Shared Object Symbol
+ 25,46% ksoftirqd/0 [kernel.kallsyms] [k] csum_partial
+ 8,82% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
+ 6,03% swapper [kernel.kallsyms] [k] arch_cpu_idle
+ 4,08% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_clean_range
+ 3,82% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_inv_range
+ 3,14% swapper [kernel.kallsyms] [k] rcu_idle_exit
+ 3,00% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_clean_range
+ 2,43% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_start_xmit
+ 1,24% swapper [kernel.kallsyms] [k] csum_partial
+ 1,20% swapper [kernel.kallsyms] [k] do_idle
+ 1,19% swapper [kernel.kallsyms] [k] skb_segment
+ 1,19% ksoftirqd/0 [kernel.kallsyms] [k] arm_dma_unmap_page
+ 1,00% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
+ 0,95% ksoftirqd/0 [kernel.kallsyms] [k] __slab_free.constprop.3
+ 0,80% ksoftirqd/0 [kernel.kallsyms] [k] skb_release_data
+ 0,77% swapper [kernel.kallsyms] [k] __dev_queue_xmit
+ 0,73% ksoftirqd/0 [kernel.kallsyms] [k] build_skb
+ 0,68% ksoftirqd/0 [kernel.kallsyms] [k] skb_segment
+ 0,66% ksoftirqd/0 [kernel.kallsyms] [k] mmiocpy
+ 0,66% ksoftirqd/0 [kernel.kallsyms] [k] skb_checksum_help
+ 0,65% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
+ 0,64% ksoftirqd/0 [kernel.kallsyms] [k] page_address
+ 0,62% ksoftirqd/0 [kernel.kallsyms] [k] __qdisc_run
+ 0,62% ksoftirqd/0 [kernel.kallsyms] [k] dma_cache_maint_page
+ 0,59% swapper [kernel.kallsyms] [k] __kmalloc_track_caller
+ 0,59% swapper [kernel.kallsyms] [k] mmiocpy
+ 0,58% ksoftirqd/0 [kernel.kallsyms] [k] sch_direct_xmit
+ 0,55% ksoftirqd/0 [kernel.kallsyms] [k] mmioset
+ 0,52% ksoftirqd/0 [kernel.kallsyms] [k] inet_gro_receive
0,49% ksoftirqd/0 [kernel.kallsyms] [k] netdev_alloc_frag
0,47% swapper [kernel.kallsyms] [k] __netif_receive_skb_core
0,45% swapper [kernel.kallsyms] [k] kmem_cache_alloc
0,45% ksoftirqd/0 [kernel.kallsyms] [k] __skb_checksum
0,43% swapper [kernel.kallsyms] [k] v7_dma_clean_range
0,39% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc
0,36% ksoftirqd/0 [kernel.kallsyms] [k] qdisc_dequeue_head
0,36% ksoftirqd/0 [kernel.kallsyms] [k] arm_dma_map_page
0,35% swapper [kernel.kallsyms] [k] mmioset
0,34% ksoftirqd/0 [kernel.kallsyms] [k] tcp_gro_receive
0,33% swapper [kernel.kallsyms] [k] __copy_skb_header
0,33% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free
0,32% ksoftirqd/0 [kernel.kallsyms] [k] netif_skb_features
0,30% swapper [kernel.kallsyms] [k] netif_skb_features
0,30% ksoftirqd/0 [kernel.kallsyms] [k] __skb_flow_dissect
2) ethtool -K eth0 gro off
Samples: 39K of event 'cycles', Event count (approx.): 13065826851
Overhead Command Shared Object Symbol
+ 11,09% swapper [kernel.kallsyms] [k] v7_dma_inv_range
+ 5,86% ksoftirqd/1 [kernel.kallsyms] [k] v7_dma_clean_range
+ 5,77% swapper [kernel.kallsyms] [k] l2c210_inv_range
+ 5,38% swapper [kernel.kallsyms] [k] __irqentry_text_end
+ 4,44% swapper [kernel.kallsyms] [k] bcma_host_soc_read32
+ 3,28% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core
+ 3,25% ksoftirqd/1 [kernel.kallsyms] [k] l2c210_clean_range
+ 2,70% swapper [kernel.kallsyms] [k] arch_cpu_idle
+ 2,25% swapper [kernel.kallsyms] [k] bgmac_poll
+ 2,14% ksoftirqd/1 [kernel.kallsyms] [k] bgmac_start_xmit
+ 1,79% ksoftirqd/1 [kernel.kallsyms] [k] __dev_queue_xmit
+ 1,36% ksoftirqd/1 [kernel.kallsyms] [k] skb_vlan_untag
+ 1,11% swapper [kernel.kallsyms] [k] __skb_flow_dissect
+ 1,07% ksoftirqd/1 [kernel.kallsyms] [k] netif_skb_features
+ 0,98% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv_core.constprop.3
+ 0,92% ksoftirqd/1 [kernel.kallsyms] [k] sch_direct_xmit
+ 0,90% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip
+ 0,86% ksoftirqd/1 [kernel.kallsyms] [k] nf_hook_slow
+ 0,82% swapper [kernel.kallsyms] [k] net_rx_action
+ 0,80% ksoftirqd/1 [kernel.kallsyms] [k] validate_xmit_skb.constprop.30
+ 0,75% swapper [kernel.kallsyms] [k] build_skb
+ 0,72% ksoftirqd/1 [kernel.kallsyms] [k] ip_forward
+ 0,71% ksoftirqd/1 [kernel.kallsyms] [k] br_handle_frame_finish
+ 0,71% ksoftirqd/1 [kernel.kallsyms] [k] skb_pull_rcsum
+ 0,65% swapper [kernel.kallsyms] [k] arm_dma_unmap_page
+ 0,59% ksoftirqd/1 [kernel.kallsyms] [k] ip_finish_output2
+ 0,59% swapper [kernel.kallsyms] [k] __skb_get_hash
+ 0,58% swapper [kernel.kallsyms] [k] dma_cache_maint_page
+ 0,55% ksoftirqd/1 [kernel.kallsyms] [k] fdb_find_rcu
+ 0,54% swapper [kernel.kallsyms] [k] bcma_host_soc_write32
+ 0,53% ksoftirqd/1 [kernel.kallsyms] [k] vlan_do_receive
+ 0,52% ksoftirqd/1 [kernel.kallsyms] [k] memmove
+ 0,52% swapper [kernel.kallsyms] [k] rcu_idle_exit
+ 0,51% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv
+ 0,51% ksoftirqd/1 [kernel.kallsyms] [k] dev_hard_start_xmit
0,49% ksoftirqd/1 [kernel.kallsyms] [k] ip_output
0,46% ksoftirqd/1 [kernel.kallsyms] [k] vlan_dev_hard_start_xmit
0,45% swapper [kernel.kallsyms] [k] enqueue_to_backlog
0,42% swapper [kernel.kallsyms] [k] netdev_alloc_frag
0,42% swapper [kernel.kallsyms] [k] skb_release_data
0,41% ksoftirqd/1 [kernel.kallsyms] [k] ip_forward_finish
0,40% ksoftirqd/1 [kernel.kallsyms] [k] br_handle_frame
0,37% ksoftirqd/1 [kernel.kallsyms] [k] mmiocpy
0,37% ksoftirqd/1 [kernel.kallsyms] [k] page_address
0,36% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
0,36% ksoftirqd/1 [kernel.kallsyms] [k] memcmp
0,36% ksoftirqd/1 [kernel.kallsyms] [k] netif_receive_skb_internal
0,34% swapper [kernel.kallsyms] [k] page_address
0,34% swapper [kernel.kallsyms] [k] mmioset
0,33% ksoftirqd/1 [kernel.kallsyms] [k] br_pass_frame_up
0,33% ksoftirqd/1 [kernel.kallsyms] [k] neigh_connected_output
0,33% swapper [kernel.kallsyms] [k] kmem_cache_alloc
0,31% ksoftirqd/1 [kernel.kallsyms] [k] mmioset
0,30% ksoftirqd/1 [kernel.kallsyms] [k] ip_finish_output
0,30% ksoftirqd/1 [kernel.kallsyms] [k] bcma_bgmac_write
Powered by blists - more mailing lists