lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6b8a94b-abb6-d976-2ed4-887e3049b979@itcare.pl>
Date:   Thu, 8 Nov 2018 15:43:35 +0100
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     David Ahern <dsahern@...il.com>, netdev <netdev@...r.kernel.org>,
        Yoel Caspersen <yoel@...knet.dk>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
 traffic



W dniu 08.11.2018 o 01:59, Paweł Staszewski pisze:
>
>
> W dniu 05.11.2018 o 21:17, Jesper Dangaard Brouer pisze:
>> On Sun, 4 Nov 2018 01:24:03 +0100 Paweł Staszewski 
>> <pstaszewski@...are.pl> wrote:
>>
>>> And today again after allpy patch for page allocator - reached again
>>> 64/64 Gbit/s
>>>
>>> with only 50-60% cpu load
>> Great.
>>
>>> today no slowpath hit for netwoking :)
>>>
>>> But again dropped pckt at 64GbitRX and 64TX ....
>>> And as it should not be pcie express limit  -i think something more is
>> Well, this does sounds like a PCIe bandwidth limit to me.
>>
>> See the PCIe BW here: https://en.wikipedia.org/wiki/PCI_Express
>>
>> You likely have PCIe v3, where 1-lane have 984.6 MBytes/s or 7.87 Gbit/s
>> Thus,  x16-lanes have 15.75 GBytes or 126 Gbit/s.  It does say "in each
>> direction", but you are also forwarding this RX->TX on both (dual) ports
>> NIC that is sharing the same PCIe slot.
> Network controller changed from 2-port 100G connectx4 to 2 separate 
> cards 100G connectx5
>
>
>    PerfTop:   92239 irqs/sec  kernel:99.4%  exact:  0.0% [4000Hz 
> cycles],  (all, 56 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>
>
>      6.65%  [kernel]       [k] irq_entries_start
>      5.57%  [kernel]       [k] tasklet_action_common.isra.21
>      4.60%  [kernel]       [k] mlx5_eq_int
>      4.04%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
>      3.66%  [kernel]       [k] _raw_spin_lock_irqsave
>      3.58%  [kernel]       [k] mlx5e_sq_xmit
>      2.66%  [kernel]       [k] fib_table_lookup
>      2.52%  [kernel]       [k] _raw_spin_lock
>      2.51%  [kernel]       [k] build_skb
>      2.50%  [kernel]       [k] _raw_spin_lock_irq
>      2.04%  [kernel]       [k] try_to_wake_up
>      1.83%  [kernel]       [k] queued_spin_lock_slowpath
>      1.81%  [kernel]       [k] mlx5e_poll_tx_cq
>      1.65%  [kernel]       [k] do_idle
>      1.50%  [kernel]       [k] mlx5e_poll_rx_cq
>      1.34%  [kernel]       [k] __sched_text_start
>      1.32%  [kernel]       [k] cmd_exec
>      1.30%  [kernel]       [k] cmd_work_handler
>      1.16%  [kernel]       [k] vlan_do_receive
>      1.15%  [kernel]       [k] memcpy_erms
>      1.15%  [kernel]       [k] __dev_queue_xmit
>      1.07%  [kernel]       [k] mlx5_cmd_comp_handler
>      1.06%  [kernel]       [k] sched_ttwu_pending
>      1.00%  [kernel]       [k] ipt_do_table
>      0.98%  [kernel]       [k] ip_finish_output2
>      0.92%  [kernel]       [k] pfifo_fast_dequeue
>      0.88%  [kernel]       [k] mlx5e_handle_rx_cqe_mpwrq
>      0.78%  [kernel]       [k] dev_gro_receive
>      0.78%  [kernel]       [k] mlx5e_napi_poll
>      0.76%  [kernel]       [k] mlx5e_post_rx_mpwqes
>      0.70%  [kernel]       [k] process_one_work
>      0.67%  [kernel]       [k] __netif_receive_skb_core
>      0.65%  [kernel]       [k] __build_skb
>      0.63%  [kernel]       [k] llist_add_batch
>      0.62%  [kernel]       [k] tcp_gro_receive
>      0.60%  [kernel]       [k] inet_gro_receive
>      0.59%  [kernel]       [k] ip_route_input_rcu
>      0.59%  [kernel]       [k] rcu_irq_exit
>      0.56%  [kernel]       [k] napi_complete_done
>      0.52%  [kernel]       [k] kmem_cache_alloc
>      0.48%  [kernel]       [k] __softirqentry_text_start
>      0.48%  [kernel]       [k] mlx5e_xmit
>      0.47%  [kernel]       [k] __queue_work
>      0.46%  [kernel]       [k] memset_erms
>      0.46%  [kernel]       [k] dev_hard_start_xmit
>      0.45%  [kernel]       [k] insert_work
>      0.45%  [kernel]       [k] enqueue_task_fair
>      0.44%  [kernel]       [k] __wake_up_common
>      0.43%  [kernel]       [k] finish_task_switch
>      0.43%  [kernel]       [k] kmem_cache_free_bulk
>      0.42%  [kernel]       [k] ip_forward
>      0.42%  [kernel]       [k] worker_thread
>      0.41%  [kernel]       [k] schedule
>      0.41%  [kernel]       [k] _raw_spin_unlock_irqrestore
>      0.40%  [kernel]       [k] netif_skb_features
>      0.40%  [kernel]       [k] queue_work_on
>      0.40%  [kernel]       [k] pfifo_fast_enqueue
>      0.39%  [kernel]       [k] vlan_dev_hard_start_xmit
>      0.39%  [kernel]       [k] page_frag_free
>      0.36%  [kernel]       [k] swiotlb_map_page
>      0.36%  [kernel]       [k] update_cfs_rq_h_load
>      0.35%  [kernel]       [k] validate_xmit_skb.isra.142
>      0.35%  [kernel]       [k] dev_ifconf
>      0.35%  [kernel]       [k] check_preempt_curr
>      0.34%  [kernel]       [k] _raw_spin_trylock
>      0.34%  [kernel]       [k] rcu_idle_exit
>      0.33%  [kernel]       [k] ip_rcv_core.isra.20.constprop.25
>      0.33%  [kernel]       [k] __qdisc_run
>      0.33%  [kernel]       [k] skb_release_data
>      0.32%  [kernel]       [k] native_sched_clock
>      0.30%  [kernel]       [k] add_interrupt_randomness
>      0.29%  [kernel]       [k] interrupt_entry
>      0.28%  [kernel]       [k] skb_gro_receive
>      0.26%  [kernel]       [k] read_tsc
>      0.26%  [kernel]       [k] __get_xps_queue_idx
>      0.26%  [kernel]       [k] inet_gifconf
>      0.26%  [kernel]       [k] skb_segment
>      0.25%  [kernel]       [k] __tasklet_schedule_common
>      0.25%  [kernel]       [k] smpboot_thread_fn
>      0.23%  [kernel]       [k] __update_load_avg_se
>      0.22%  [kernel]       [k] tcp4_gro_receive
>
>
> Not much traffic now:
>   bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
>   input: /proc/net/dev type: rate
>   |         iface                   Rx Tx                Total
> ============================================================================== 
>
>          enp175s0:           6.95 Gb/s            4.20 Gb/s           
> 11.15 Gb/s
>          enp216s0:           4.23 Gb/s            6.98 Gb/s           
> 11.21 Gb/s
> ------------------------------------------------------------------------------ 
>
>             total:          11.18 Gb/s           11.18 Gb/s           
> 22.37 Gb/s
>
>   bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
>   input: /proc/net/dev type: rate
>   |         iface                   Rx Tx                Total
> ============================================================================== 
>
>          enp175s0:       700264.50 P/s        923890.25 P/s 1624154.75 
> P/s
>          enp216s0:       932598.81 P/s        708771.50 P/s 1641370.25 
> P/s
> ------------------------------------------------------------------------------ 
>
>             total:      1632863.38 P/s       1632661.75 P/s 3265525.00 
> P/s
>
>
>
updated perf top - more traffic 37Gbit/37Gbit total traffic
  bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
   input: /proc/net/dev type: rate
   /         iface                   Rx Tx                Total
==============================================================================
          enp175s0:          28.91 Gb/s            8.89 Gb/s           
37.80 Gb/s
          enp216s0:           8.91 Gb/s           28.95 Gb/s           
37.86 Gb/s
------------------------------------------------------------------------------
             total:          37.82 Gb/s           37.84 Gb/s           
75.67 Gb/s

   bwm-ng v0.6.1 (probing every 0.500s), press 'h' for help
   input: /proc/net/dev type: rate
   -         iface                   Rx Tx                Total
==============================================================================
          enp175s0:      2721518.75 P/s       2460930.50 P/s 5182449.50 P/s
          enp216s0:      2471451.25 P/s       2731946.25 P/s 5203397.50 P/s
------------------------------------------------------------------------------
             total:      5192970.00 P/s       5192876.50 P/s 10385847.00 P/s





    PerfTop:   56488 irqs/sec  kernel:99.4%  exact:  0.0% [4000Hz 
cycles],  (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     10.41%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
      7.73%  [kernel]       [k] mlx5e_sq_xmit
      6.05%  [kernel]       [k] build_skb
      5.63%  [kernel]       [k] fib_table_lookup
      2.75%  [kernel]       [k] mlx5e_poll_rx_cq
      2.74%  [kernel]       [k] memcpy_erms
      2.33%  [kernel]       [k] vlan_do_receive
      2.00%  [kernel]       [k] __dev_queue_xmit
      1.81%  [kernel]       [k] ip_finish_output2
      1.79%  [kernel]       [k] dev_gro_receive
      1.78%  [kernel]       [k] ipt_do_table
      1.78%  [kernel]       [k] mlx5e_handle_rx_cqe_mpwrq
      1.76%  [kernel]       [k] pfifo_fast_dequeue
      1.70%  [kernel]       [k] mlx5e_post_rx_mpwqes
      1.52%  [kernel]       [k] mlx5e_poll_tx_cq
      1.49%  [kernel]       [k] irq_entries_start
      1.47%  [kernel]       [k] _raw_spin_lock
      1.45%  [kernel]       [k] inet_gro_receive
      1.42%  [kernel]       [k] __netif_receive_skb_core
      1.39%  [kernel]       [k] mlx5_eq_int
      1.39%  [kernel]       [k] tcp_gro_receive
      1.23%  [kernel]       [k] __build_skb
      1.14%  [kernel]       [k] ip_route_input_rcu
      1.00%  [kernel]       [k] vlan_dev_hard_start_xmit
      0.92%  [kernel]       [k] _raw_spin_lock_irqsave
      0.89%  [kernel]       [k] kmem_cache_alloc
      0.88%  [kernel]       [k] dev_hard_start_xmit
      0.88%  [kernel]       [k] swiotlb_map_page
      0.86%  [kernel]       [k] mlx5e_xmit
      0.81%  [kernel]       [k] ip_forward
      0.80%  [kernel]       [k] tasklet_action_common.isra.21
      0.79%  [kernel]       [k] netif_skb_features
      0.77%  [kernel]       [k] pfifo_fast_enqueue
      0.66%  [kernel]       [k] validate_xmit_skb.isra.142
      0.64%  [kernel]       [k] ip_rcv_core.isra.20.constprop.25
      0.63%  [kernel]       [k] find_busiest_group
      0.60%  [kernel]       [k] __qdisc_run
      0.59%  [kernel]       [k] skb_release_data
      0.59%  [kernel]       [k] skb_gro_receive
      0.58%  [kernel]       [k] page_frag_free
      0.53%  [kernel]       [k] skb_segment
      0.52%  [kernel]       [k] try_to_wake_up
      0.52%  [kernel]       [k] _raw_spin_lock_irq
      0.50%  [kernel]       [k] tcp4_gro_receive
      0.47%  [kernel]       [k] kmem_cache_free_bulk
      0.45%  [kernel]       [k] mlx5e_page_release
      0.43%  [kernel]       [k] _raw_spin_trylock
      0.39%  [kernel]       [k] kmem_cache_free
      0.38%  [kernel]       [k] __sched_text_start
      0.38%  [kernel]       [k] sch_direct_xmit
      0.38%  [kernel]       [k] do_idle
      0.34%  [kernel]       [k] vlan_passthru_hard_header
      0.34%  [kernel]       [k] cmd_exec
      0.34%  [kernel]       [k] __local_bh_enable_ip
      0.33%  [kernel]       [k] inet_lookup_ifaddr_rcu
      0.33%  [kernel]       [k] skb_network_protocol
      0.33%  [kernel]       [k] netdev_pick_tx
      0.33%  [kernel]       [k] eth_type_trans
      0.32%  [kernel]       [k] __get_xps_queue_idx
      0.31%  [kernel]       [k] __slab_free.isra.79
      0.29%  [kernel]       [k] mlx5e_xdp_handle
      0.27%  [kernel]       [k] sched_ttwu_pending
      0.26%  [kernel]       [k] cmd_work_handler
      0.24%  [kernel]       [k] ip_finish_output
      0.23%  [kernel]       [k] neigh_connected_output
      0.23%  [kernel]       [k] napi_gro_receive
      0.23%  [kernel]       [k] mlx5e_napi_poll
      0.23%  [kernel]       [k] mlx5e_features_check
      0.22%  [kernel]       [k] ip_output
      0.21%  [kernel]       [k] ip_rcv_finish_core.isra.17
      0.21%  [kernel]       [k] fib_validate_source
      0.20%  [kernel]       [k] dev_ifconf
      0.20%  [kernel]       [k] eth_header
      0.20%  [kernel]       [k] __netdev_pick_tx
      0.20%  [kernel]       [k] mlx5_cmd_comp_handler
      0.19%  [kernel]       [k] memset_erms
      0.18%  [kernel]       [k] __netif_receive_skb_one_core
      0.18%  [kernel]       [k] __memcpy
      0.18%  [kernel]       [k] queued_spin_lock_slowpath
      0.18%  [kernel]       [k] nf_hook_slow
      0.17%  [kernel]       [k] enqueue_task_fair


Also modified a little coal settings for connectx5 compared to connectx4

  ethtool -c enp175s0
Coalesce parameters for enp175s0:
Adaptive RX: off  TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
dmac: 32588

rx-usecs: 128
rx-frames: 128
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 8
tx-frames: 128
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0


So far cpu load (looks better than in previous configuration with 2 port 
100G connectx4):

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  
%guest  %gnice   %idle
Average:     all    0.05    0.00    0.64    0.01    0.00    8.79 0.00    
0.00    0.00   90.51
Average:       0    0.00    0.00    0.10    0.00    0.00    0.00 0.00    
0.00    0.00   99.90
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       2    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       4    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       5    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       6    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       7    0.10    0.00    1.30    0.00    0.00    0.00 0.00    
0.00    0.00   98.60
Average:       8    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:       9    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      10    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      11    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      12    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      13    0.00    0.00    1.00    0.00    0.00    0.00 0.00    
0.00    0.00   99.00
Average:      14    0.10    0.00    0.80    0.00    0.00   22.80 0.00    
0.00    0.00   76.30
Average:      15    0.10    0.00    0.70    0.00    0.00   21.20 0.00    
0.00    0.00   78.00
Average:      16    0.00    0.00    0.80    0.00    0.00   17.70 0.00    
0.00    0.00   81.50
Average:      17    0.00    0.00    0.50    0.00    0.00   15.10 0.00    
0.00    0.00   84.40
Average:      18    0.00    0.00    0.70    0.00    0.00   20.90 0.00    
0.00    0.00   78.40
Average:      19    0.10    0.00    0.70    0.00    0.00   20.50 0.00    
0.00    0.00   78.70
Average:      20    0.50    0.00    1.70    0.00    0.00   18.80 0.00    
0.00    0.00   79.00
Average:      21    0.10    0.00    1.30    0.00    0.00   20.90 0.00    
0.00    0.00   77.70
Average:      22    0.00    0.00    0.70    0.00    0.00   19.40 0.00    
0.00    0.00   79.90
Average:      23    0.00    0.00    0.90    0.00    0.00   18.50 0.00    
0.00    0.00   80.60
Average:      24    0.10    0.00    1.00    0.00    0.00   15.80 0.00    
0.00    0.00   83.10
Average:      25    0.00    0.00    0.70    0.00    0.00   19.50 0.00    
0.00    0.00   79.80
Average:      26    0.00    0.00    0.50    0.00    0.00   18.30 0.00    
0.00    0.00   81.20
Average:      27    0.00    0.00    0.70    0.00    0.00   17.60 0.00    
0.00    0.00   81.70
Average:      28    0.00    0.00    0.70    0.00    0.00    0.00 0.00    
0.00    0.00   99.30
Average:      29    0.00    0.00    2.00    0.00    0.00    0.00 0.00    
0.00    0.00   98.00
Average:      30    0.00    0.00    0.10    0.00    0.00    0.00 0.00    
0.00    0.00   99.90
Average:      31    0.00    0.00    2.50    0.00    0.00    0.00 0.00    
0.00    0.00   97.50
Average:      32    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      33    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      34    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      35    0.00    0.00    0.70    0.00    0.00    0.00 0.00    
0.00    0.00   99.30
Average:      36    0.00    0.00    2.00    0.00    0.00    0.00 0.00    
0.00    0.00   98.00
Average:      37    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      38    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      39    0.00    0.00    1.40    0.00    0.00    0.00 0.00    
0.00    0.00   98.60
Average:      40    0.60    0.00    0.40    0.00    0.00    0.00 0.00    
0.00    0.00   99.00
Average:      41    0.00    0.00    0.00    0.00    0.00    0.00 0.00    
0.00    0.00  100.00
Average:      42    0.00    0.00    1.20    0.00    0.00   17.70 0.00    
0.00    0.00   81.10
Average:      43    0.00    0.00    0.70    0.00    0.00   20.00 0.00    
0.00    0.00   79.30
Average:      44    0.00    0.00    0.50    0.00    0.00   16.10 0.00    
0.00    0.00   83.40
Average:      45    0.30    0.00    1.10    0.00    0.00   16.10 0.00    
0.00    0.00   82.50
Average:      46    0.00    0.00    0.80    0.00    0.00   14.80 0.00    
0.00    0.00   84.40
Average:      47    0.10    0.00    1.60    0.00    0.00   17.20 0.00    
0.00    0.00   81.10
Average:      48    0.00    0.00    0.60    0.00    0.00   15.00 0.00    
0.00    0.00   84.40
Average:      49    0.10    0.00    0.80    0.00    0.00   14.90 0.00    
0.00    0.00   84.20
Average:      50    0.20    0.00    0.50    0.70    0.00   13.60 0.00    
0.00    0.00   85.00
Average:      51    0.00    0.00    0.70    0.00    0.00   14.10 0.00    
0.00    0.00   85.20
Average:      52    0.20    0.00    1.60    0.00    0.00   16.80 0.00    
0.00    0.00   81.40
Average:      53    0.00    0.00    0.80    0.00    0.00   13.20 0.00    
0.00    0.00   86.00
Average:      54    0.20    0.00    0.50    0.00    0.00   17.20 0.00    
0.00    0.00   82.10
Average:      55    0.00    0.00    0.40    0.00    0.00   18.30 0.00    
0.00    0.00   81.30

>
>
>
>>
>>
>>> going on there - and hard to catch - cause perf top doestn chenged
>>> besides there is no queued slowpath hit now
>>>
>>> I ordered now also intel cards to compare - but 3 weeks eta
>>> Faster - cause 3 days - i will have mellanox connectx 5 - so can
>>> separate traffic to two different x16 pcie busses
>> I do think you need to separate traffic to two different x16 PCIe
>> slots.  I have found that the ConnectX-5 is significantly faster
>> packet-per-sec performance than ConnectX-4, but that is not your
>> use-case (max BW). I've not tested these NICs for maximum
>> _bidirectional_ bandwidth limits, I've only made sure I can do 100G
>> unidirectional, which can hit some funny motherboard memory limits
>> (remember to equip motherboard with 4 RAM blocks for full memory BW).
>>
> Yes memory channels are separated and there are 4 modules per cpu :)
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ