[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6b8a94b-abb6-d976-2ed4-887e3049b979@itcare.pl>
Date: Thu, 8 Nov 2018 15:43:35 +0100
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: David Ahern <dsahern@...il.com>, netdev <netdev@...r.kernel.org>,
Yoel Caspersen <yoel@...knet.dk>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
traffic
W dniu 08.11.2018 o 01:59, Paweł Staszewski pisze:
>
>
> W dniu 05.11.2018 o 21:17, Jesper Dangaard Brouer pisze:
>> On Sun, 4 Nov 2018 01:24:03 +0100 Paweł Staszewski
>> <pstaszewski@...are.pl> wrote:
>>
>>> And today again after allpy patch for page allocator - reached again
>>> 64/64 Gbit/s
>>>
>>> with only 50-60% cpu load
>> Great.
>>
>>> today no slowpath hit for netwoking :)
>>>
>>> But again dropped pckt at 64GbitRX and 64TX ....
>>> And as it should not be pcie express limit -i think something more is
>> Well, this does sounds like a PCIe bandwidth limit to me.
>>
>> See the PCIe BW here: https://en.wikipedia.org/wiki/PCI_Express
>>
>> You likely have PCIe v3, where 1-lane have 984.6 MBytes/s or 7.87 Gbit/s
>> Thus, x16-lanes have 15.75 GBytes or 126 Gbit/s. It does say "in each
>> direction", but you are also forwarding this RX->TX on both (dual) ports
>> NIC that is sharing the same PCIe slot.
> Network controller changed from 2-port 100G connectx4 to 2 separate
> cards 100G connectx5
>
>
> PerfTop: 92239 irqs/sec kernel:99.4% exact: 0.0% [4000Hz
> cycles], (all, 56 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> 6.65% [kernel] [k] irq_entries_start
> 5.57% [kernel] [k] tasklet_action_common.isra.21
> 4.60% [kernel] [k] mlx5_eq_int
> 4.04% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
> 3.66% [kernel] [k] _raw_spin_lock_irqsave
> 3.58% [kernel] [k] mlx5e_sq_xmit
> 2.66% [kernel] [k] fib_table_lookup
> 2.52% [kernel] [k] _raw_spin_lock
> 2.51% [kernel] [k] build_skb
> 2.50% [kernel] [k] _raw_spin_lock_irq
> 2.04% [kernel] [k] try_to_wake_up
> 1.83% [kernel] [k] queued_spin_lock_slowpath
> 1.81% [kernel] [k] mlx5e_poll_tx_cq
> 1.65% [kernel] [k] do_idle
> 1.50% [kernel] [k] mlx5e_poll_rx_cq
> 1.34% [kernel] [k] __sched_text_start
> 1.32% [kernel] [k] cmd_exec
> 1.30% [kernel] [k] cmd_work_handler
> 1.16% [kernel] [k] vlan_do_receive
> 1.15% [kernel] [k] memcpy_erms
> 1.15% [kernel] [k] __dev_queue_xmit
> 1.07% [kernel] [k] mlx5_cmd_comp_handler
> 1.06% [kernel] [k] sched_ttwu_pending
> 1.00% [kernel] [k] ipt_do_table
> 0.98% [kernel] [k] ip_finish_output2
> 0.92% [kernel] [k] pfifo_fast_dequeue
> 0.88% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
> 0.78% [kernel] [k] dev_gro_receive
> 0.78% [kernel] [k] mlx5e_napi_poll
> 0.76% [kernel] [k] mlx5e_post_rx_mpwqes
> 0.70% [kernel] [k] process_one_work
> 0.67% [kernel] [k] __netif_receive_skb_core
> 0.65% [kernel] [k] __build_skb
> 0.63% [kernel] [k] llist_add_batch
> 0.62% [kernel] [k] tcp_gro_receive
> 0.60% [kernel] [k] inet_gro_receive
> 0.59% [kernel] [k] ip_route_input_rcu
> 0.59% [kernel] [k] rcu_irq_exit
> 0.56% [kernel] [k] napi_complete_done
> 0.52% [kernel] [k] kmem_cache_alloc
> 0.48% [kernel] [k] __softirqentry_text_start
> 0.48% [kernel] [k] mlx5e_xmit
> 0.47% [kernel] [k] __queue_work
> 0.46% [kernel] [k] memset_erms
> 0.46% [kernel] [k] dev_hard_start_xmit
> 0.45% [kernel] [k] insert_work
> 0.45% [kernel] [k] enqueue_task_fair
> 0.44% [kernel] [k] __wake_up_common
> 0.43% [kernel] [k] finish_task_switch
> 0.43% [kernel] [k] kmem_cache_free_bulk
> 0.42% [kernel] [k] ip_forward
> 0.42% [kernel] [k] worker_thread
> 0.41% [kernel] [k] schedule
> 0.41% [kernel] [k] _raw_spin_unlock_irqrestore
> 0.40% [kernel] [k] netif_skb_features
> 0.40% [kernel] [k] queue_work_on
> 0.40% [kernel] [k] pfifo_fast_enqueue
> 0.39% [kernel] [k] vlan_dev_hard_start_xmit
> 0.39% [kernel] [k] page_frag_free
> 0.36% [kernel] [k] swiotlb_map_page
> 0.36% [kernel] [k] update_cfs_rq_h_load
> 0.35% [kernel] [k] validate_xmit_skb.isra.142
> 0.35% [kernel] [k] dev_ifconf
> 0.35% [kernel] [k] check_preempt_curr
> 0.34% [kernel] [k] _raw_spin_trylock
> 0.34% [kernel] [k] rcu_idle_exit
> 0.33% [kernel] [k] ip_rcv_core.isra.20.constprop.25
> 0.33% [kernel] [k] __qdisc_run
> 0.33% [kernel] [k] skb_release_data
> 0.32% [kernel] [k] native_sched_clock
> 0.30% [kernel] [k] add_interrupt_randomness
> 0.29% [kernel] [k] interrupt_entry
> 0.28% [kernel] [k] skb_gro_receive
> 0.26% [kernel] [k] read_tsc
> 0.26% [kernel] [k] __get_xps_queue_idx
> 0.26% [kernel] [k] inet_gifconf
> 0.26% [kernel] [k] skb_segment
> 0.25% [kernel] [k] __tasklet_schedule_common
> 0.25% [kernel] [k] smpboot_thread_fn
> 0.23% [kernel] [k] __update_load_avg_se
> 0.22% [kernel] [k] tcp4_gro_receive
>
>
> Not much traffic now:
> bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
> input: /proc/net/dev type: rate
> | iface Rx Tx Total
> ==============================================================================
>
> enp175s0: 6.95 Gb/s 4.20 Gb/s
> 11.15 Gb/s
> enp216s0: 4.23 Gb/s 6.98 Gb/s
> 11.21 Gb/s
> ------------------------------------------------------------------------------
>
> total: 11.18 Gb/s 11.18 Gb/s
> 22.37 Gb/s
>
> bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
> input: /proc/net/dev type: rate
> | iface Rx Tx Total
> ==============================================================================
>
> enp175s0: 700264.50 P/s 923890.25 P/s 1624154.75
> P/s
> enp216s0: 932598.81 P/s 708771.50 P/s 1641370.25
> P/s
> ------------------------------------------------------------------------------
>
> total: 1632863.38 P/s 1632661.75 P/s 3265525.00
> P/s
>
>
>
updated perf top - more traffic 37Gbit/37Gbit total traffic
bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
input: /proc/net/dev type: rate
/ iface Rx Tx Total
==============================================================================
enp175s0: 28.91 Gb/s 8.89 Gb/s
37.80 Gb/s
enp216s0: 8.91 Gb/s 28.95 Gb/s
37.86 Gb/s
------------------------------------------------------------------------------
total: 37.82 Gb/s 37.84 Gb/s
75.67 Gb/s
bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
input: /proc/net/dev type: rate
- iface Rx Tx Total
==============================================================================
enp175s0: 2721518.75 P/s 2460930.50 P/s 5182449.50 P/s
enp216s0: 2471451.25 P/s 2731946.25 P/s 5203397.50 P/s
------------------------------------------------------------------------------
total: 5192970.00 P/s 5192876.50 P/s 10385847.00 P/s
PerfTop: 56488 irqs/sec kernel:99.4% exact: 0.0% [4000Hz
cycles], (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.41% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
7.73% [kernel] [k] mlx5e_sq_xmit
6.05% [kernel] [k] build_skb
5.63% [kernel] [k] fib_table_lookup
2.75% [kernel] [k] mlx5e_poll_rx_cq
2.74% [kernel] [k] memcpy_erms
2.33% [kernel] [k] vlan_do_receive
2.00% [kernel] [k] __dev_queue_xmit
1.81% [kernel] [k] ip_finish_output2
1.79% [kernel] [k] dev_gro_receive
1.78% [kernel] [k] ipt_do_table
1.78% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
1.76% [kernel] [k] pfifo_fast_dequeue
1.70% [kernel] [k] mlx5e_post_rx_mpwqes
1.52% [kernel] [k] mlx5e_poll_tx_cq
1.49% [kernel] [k] irq_entries_start
1.47% [kernel] [k] _raw_spin_lock
1.45% [kernel] [k] inet_gro_receive
1.42% [kernel] [k] __netif_receive_skb_core
1.39% [kernel] [k] mlx5_eq_int
1.39% [kernel] [k] tcp_gro_receive
1.23% [kernel] [k] __build_skb
1.14% [kernel] [k] ip_route_input_rcu
1.00% [kernel] [k] vlan_dev_hard_start_xmit
0.92% [kernel] [k] _raw_spin_lock_irqsave
0.89% [kernel] [k] kmem_cache_alloc
0.88% [kernel] [k] dev_hard_start_xmit
0.88% [kernel] [k] swiotlb_map_page
0.86% [kernel] [k] mlx5e_xmit
0.81% [kernel] [k] ip_forward
0.80% [kernel] [k] tasklet_action_common.isra.21
0.79% [kernel] [k] netif_skb_features
0.77% [kernel] [k] pfifo_fast_enqueue
0.66% [kernel] [k] validate_xmit_skb.isra.142
0.64% [kernel] [k] ip_rcv_core.isra.20.constprop.25
0.63% [kernel] [k] find_busiest_group
0.60% [kernel] [k] __qdisc_run
0.59% [kernel] [k] skb_release_data
0.59% [kernel] [k] skb_gro_receive
0.58% [kernel] [k] page_frag_free
0.53% [kernel] [k] skb_segment
0.52% [kernel] [k] try_to_wake_up
0.52% [kernel] [k] _raw_spin_lock_irq
0.50% [kernel] [k] tcp4_gro_receive
0.47% [kernel] [k] kmem_cache_free_bulk
0.45% [kernel] [k] mlx5e_page_release
0.43% [kernel] [k] _raw_spin_trylock
0.39% [kernel] [k] kmem_cache_free
0.38% [kernel] [k] __sched_text_start
0.38% [kernel] [k] sch_direct_xmit
0.38% [kernel] [k] do_idle
0.34% [kernel] [k] vlan_passthru_hard_header
0.34% [kernel] [k] cmd_exec
0.34% [kernel] [k] __local_bh_enable_ip
0.33% [kernel] [k] inet_lookup_ifaddr_rcu
0.33% [kernel] [k] skb_network_protocol
0.33% [kernel] [k] netdev_pick_tx
0.33% [kernel] [k] eth_type_trans
0.32% [kernel] [k] __get_xps_queue_idx
0.31% [kernel] [k] __slab_free.isra.79
0.29% [kernel] [k] mlx5e_xdp_handle
0.27% [kernel] [k] sched_ttwu_pending
0.26% [kernel] [k] cmd_work_handler
0.24% [kernel] [k] ip_finish_output
0.23% [kernel] [k] neigh_connected_output
0.23% [kernel] [k] napi_gro_receive
0.23% [kernel] [k] mlx5e_napi_poll
0.23% [kernel] [k] mlx5e_features_check
0.22% [kernel] [k] ip_output
0.21% [kernel] [k] ip_rcv_finish_core.isra.17
0.21% [kernel] [k] fib_validate_source
0.20% [kernel] [k] dev_ifconf
0.20% [kernel] [k] eth_header
0.20% [kernel] [k] __netdev_pick_tx
0.20% [kernel] [k] mlx5_cmd_comp_handler
0.19% [kernel] [k] memset_erms
0.18% [kernel] [k] __netif_receive_skb_one_core
0.18% [kernel] [k] __memcpy
0.18% [kernel] [k] queued_spin_lock_slowpath
0.18% [kernel] [k] nf_hook_slow
0.17% [kernel] [k] enqueue_task_fair
Also modified a little coal settings for connectx5 compared to connectx4
ethtool -c enp175s0
Coalesce parameters for enp175s0:
Adaptive RX: off TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
dmac: 32588
rx-usecs: 128
rx-frames: 128
rx-usecs-irq: 0
rx-frames-irq: 0
tx-usecs: 8
tx-frames: 128
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
So far cpu load (looks better than in previous configuration with 2 port
100G connectx4):
Average: CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %gnice %idle
Average: all 0.05 0.00 0.64 0.01 0.00 8.79 0.00
0.00 0.00 90.51
Average: 0 0.00 0.00 0.10 0.00 0.00 0.00 0.00
0.00 0.00 99.90
Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 7 0.10 0.00 1.30 0.00 0.00 0.00 0.00
0.00 0.00 98.60
Average: 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 13 0.00 0.00 1.00 0.00 0.00 0.00 0.00
0.00 0.00 99.00
Average: 14 0.10 0.00 0.80 0.00 0.00 22.80 0.00
0.00 0.00 76.30
Average: 15 0.10 0.00 0.70 0.00 0.00 21.20 0.00
0.00 0.00 78.00
Average: 16 0.00 0.00 0.80 0.00 0.00 17.70 0.00
0.00 0.00 81.50
Average: 17 0.00 0.00 0.50 0.00 0.00 15.10 0.00
0.00 0.00 84.40
Average: 18 0.00 0.00 0.70 0.00 0.00 20.90 0.00
0.00 0.00 78.40
Average: 19 0.10 0.00 0.70 0.00 0.00 20.50 0.00
0.00 0.00 78.70
Average: 20 0.50 0.00 1.70 0.00 0.00 18.80 0.00
0.00 0.00 79.00
Average: 21 0.10 0.00 1.30 0.00 0.00 20.90 0.00
0.00 0.00 77.70
Average: 22 0.00 0.00 0.70 0.00 0.00 19.40 0.00
0.00 0.00 79.90
Average: 23 0.00 0.00 0.90 0.00 0.00 18.50 0.00
0.00 0.00 80.60
Average: 24 0.10 0.00 1.00 0.00 0.00 15.80 0.00
0.00 0.00 83.10
Average: 25 0.00 0.00 0.70 0.00 0.00 19.50 0.00
0.00 0.00 79.80
Average: 26 0.00 0.00 0.50 0.00 0.00 18.30 0.00
0.00 0.00 81.20
Average: 27 0.00 0.00 0.70 0.00 0.00 17.60 0.00
0.00 0.00 81.70
Average: 28 0.00 0.00 0.70 0.00 0.00 0.00 0.00
0.00 0.00 99.30
Average: 29 0.00 0.00 2.00 0.00 0.00 0.00 0.00
0.00 0.00 98.00
Average: 30 0.00 0.00 0.10 0.00 0.00 0.00 0.00
0.00 0.00 99.90
Average: 31 0.00 0.00 2.50 0.00 0.00 0.00 0.00
0.00 0.00 97.50
Average: 32 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 34 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 35 0.00 0.00 0.70 0.00 0.00 0.00 0.00
0.00 0.00 99.30
Average: 36 0.00 0.00 2.00 0.00 0.00 0.00 0.00
0.00 0.00 98.00
Average: 37 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 38 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 39 0.00 0.00 1.40 0.00 0.00 0.00 0.00
0.00 0.00 98.60
Average: 40 0.60 0.00 0.40 0.00 0.00 0.00 0.00
0.00 0.00 99.00
Average: 41 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 42 0.00 0.00 1.20 0.00 0.00 17.70 0.00
0.00 0.00 81.10
Average: 43 0.00 0.00 0.70 0.00 0.00 20.00 0.00
0.00 0.00 79.30
Average: 44 0.00 0.00 0.50 0.00 0.00 16.10 0.00
0.00 0.00 83.40
Average: 45 0.30 0.00 1.10 0.00 0.00 16.10 0.00
0.00 0.00 82.50
Average: 46 0.00 0.00 0.80 0.00 0.00 14.80 0.00
0.00 0.00 84.40
Average: 47 0.10 0.00 1.60 0.00 0.00 17.20 0.00
0.00 0.00 81.10
Average: 48 0.00 0.00 0.60 0.00 0.00 15.00 0.00
0.00 0.00 84.40
Average: 49 0.10 0.00 0.80 0.00 0.00 14.90 0.00
0.00 0.00 84.20
Average: 50 0.20 0.00 0.50 0.70 0.00 13.60 0.00
0.00 0.00 85.00
Average: 51 0.00 0.00 0.70 0.00 0.00 14.10 0.00
0.00 0.00 85.20
Average: 52 0.20 0.00 1.60 0.00 0.00 16.80 0.00
0.00 0.00 81.40
Average: 53 0.00 0.00 0.80 0.00 0.00 13.20 0.00
0.00 0.00 86.00
Average: 54 0.20 0.00 0.50 0.00 0.00 17.20 0.00
0.00 0.00 82.10
Average: 55 0.00 0.00 0.40 0.00 0.00 18.30 0.00
0.00 0.00 81.30
>
>
>
>>
>>
>>> going on there - and hard to catch - cause perf top doestn chenged
>>> besides there is no queued slowpath hit now
>>>
>>> I ordered now also intel cards to compare - but 3 weeks eta
>>> Faster - cause 3 days - i will have mellanox connectx 5 - so can
>>> separate traffic to two different x16 pcie busses
>> I do think you need to separate traffic to two different x16 PCIe
>> slots. I have found that the ConnectX-5 is significantly faster
>> packet-per-sec performance than ConnectX-4, but that is not your
>> use-case (max BW). I've not tested these NICs for maximum
>> _bidirectional_ bandwidth limits, I've only made sure I can do 100G
>> unidirectional, which can hit some funny motherboard memory limits
>> (remember to equip motherboard with 4 RAM blocks for full memory BW).
>>
> Yes memory channels are separated and there are 4 modules per cpu :)
>
>
Powered by blists - more mailing lists