netdev - Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa165c95-6a1f-c50e-cfa5-30fda02ca9d6@itcare.pl>
Date:   Mon, 12 Nov 2018 20:19:01 +0100
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
 traffic


W dniu 11.11.2018 o 09:56, Jesper Dangaard Brouer pisze:
> On Sat, 10 Nov 2018 22:53:53 +0100 Paweł Staszewski <pstaszewski@...are.pl> wrote:
>
>> Now im messing with ring configuration for connectx5 nics.
>> And after reading that paper:
>> https://netdevconf.org/2.1/slides/apr6/network-performance/04-amir-RX_and_TX_bulking_v2.pdf
>>
> Do notice that some of the ideas in that slide deck, was never
> implemented. But they are still on my todo list ;-).
>
> Notice how that it show that TX bulking is very important, but based on
> your ethtool_stats.pl, I can see that not much TX bulking is happening
> in your case.  This is indicated via the xmit_more counters.
>
>   Ethtool(enp175s0) stat:    2630 (     2,630) <= tx_xmit_more /sec
>   Ethtool(enp175s0) stat: 4956995 ( 4,956,995) <= tx_packets /sec
>
> And the per queue levels are also avail:
>
>   Ethtool(enp175s0) stat: 184845 ( 184,845) <= tx7_packets /sec
>   Ethtool(enp175s0) stat:     78 (      78) <= tx7_xmit_more /sec
>
> This means that you are doing too many doorbell's to the NIC hardware
> at TX time, which I worry could be what cause the NIC and PCIe hardware
> not to operate at optimal speeds.

After tunning coal/ring a little with ethtool

Reached today:

  bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
   input: /proc/net/dev type: rate
   |         iface                   Rx Tx                Total
==============================================================================
          enp175s0:          50.68 Gb/s           21.53 Gb/s           
72.20 Gb/s
          enp216s0:          21.62 Gb/s           50.81 Gb/s           
72.42 Gb/s
------------------------------------------------------------------------------
             total:          72.30 Gb/s           72.33 Gb/s          
144.63 Gb/s



And still no packet loss (icmp side to side test every 100ms)

Below perf top


    PerfTop:  104692 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz 
cycles],  (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

      9.06%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
      6.43%  [kernel]       [k] tasklet_action_common.isra.21
      5.68%  [kernel]       [k] fib_table_lookup
      4.89%  [kernel]       [k] irq_entries_start
      4.53%  [kernel]       [k] mlx5_eq_int
      4.10%  [kernel]       [k] build_skb
      3.39%  [kernel]       [k] mlx5e_poll_tx_cq
      3.38%  [kernel]       [k] mlx5e_sq_xmit
      2.73%  [kernel]       [k] mlx5e_poll_rx_cq
      2.18%  [kernel]       [k] __dev_queue_xmit
      2.13%  [kernel]       [k] vlan_do_receive
      2.12%  [kernel]       [k] mlx5e_handle_rx_cqe_mpwrq
      2.00%  [kernel]       [k] ip_finish_output2
      1.87%  [kernel]       [k] mlx5e_post_rx_mpwqes
      1.86%  [kernel]       [k] memcpy_erms
      1.85%  [kernel]       [k] ipt_do_table
      1.70%  [kernel]       [k] dev_gro_receive
      1.39%  [kernel]       [k] __netif_receive_skb_core
      1.31%  [kernel]       [k] inet_gro_receive
      1.21%  [kernel]       [k] ip_route_input_rcu
      1.21%  [kernel]       [k] tcp_gro_receive
      1.13%  [kernel]       [k] _raw_spin_lock
      1.08%  [kernel]       [k] __build_skb
      1.06%  [kernel]       [k] kmem_cache_free_bulk
      1.05%  [kernel]       [k] __softirqentry_text_start
      1.03%  [kernel]       [k] vlan_dev_hard_start_xmit
      0.98%  [kernel]       [k] pfifo_fast_dequeue
      0.95%  [kernel]       [k] mlx5e_xmit
      0.95%  [kernel]       [k] page_frag_free
      0.88%  [kernel]       [k] ip_forward
      0.81%  [kernel]       [k] dev_hard_start_xmit
      0.78%  [kernel]       [k] rcu_irq_exit
      0.77%  [kernel]       [k] netif_skb_features
      0.72%  [kernel]       [k] napi_complete_done
      0.72%  [kernel]       [k] kmem_cache_alloc
      0.68%  [kernel]       [k] validate_xmit_skb.isra.142
      0.66%  [kernel]       [k] ip_rcv_core.isra.20.constprop.25
      0.58%  [kernel]       [k] swiotlb_map_page
      0.57%  [kernel]       [k] __qdisc_run
      0.56%  [kernel]       [k] tasklet_action
      0.54%  [kernel]       [k] __get_xps_queue_idx
      0.54%  [kernel]       [k] inet_lookup_ifaddr_rcu
      0.50%  [kernel]       [k] tcp4_gro_receive
      0.49%  [kernel]       [k] skb_release_data
      0.47%  [kernel]       [k] eth_type_trans
      0.40%  [kernel]       [k] sch_direct_xmit
      0.40%  [kernel]       [k] net_rx_action
      0.39%  [kernel]       [k] __local_bh_enable_ip


And perf record/report

https://ufile.io/zguq0





So now i know what was causing cpu load for some processes like:

2913 root      20   0       0      0      0 I  10.3  0.0 6:58.29 
kworker/u112:1-
     7 root      20   0       0      0      0 I   8.6  0.0 6:17.18 
kworker/u112:0-
10289 root      20   0       0      0      0 I   6.6  0.0 6:33.90 
kworker/u112:4-
  2939 root      20   0       0      0      0 R   3.6  0.0 7:37.68 
kworker/u112:2-



After disabling adaptative tx for coalescense - all this processes gone.

lavg drops from 40 to 1

Current settings for coalescence:

ethtool -c enp175s0
Coalesce parameters for enp175s0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
dmac: 32548

rx-usecs: 24
rx-frames: 256
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 64
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

And currently with that traffiv lvls - have no packet loss (cpu is avg. 
60% for all 28 cores)