[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61e30474-b5e9-4dc8-a8a6-90cdd17d2a66@gmail.com>
Date: Wed, 31 Oct 2018 15:09:05 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Paweł Staszewski <pstaszewski@...are.pl>,
netdev <netdev@...r.kernel.org>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
traffic
On 10/31/2018 02:57 PM, Paweł Staszewski wrote:
> Hi
>
> So maybee someone will be interested how linux kernel handles normal traffic (not pktgen :) )
>
>
> Server HW configuration:
>
> CPU : Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
>
> NIC's: 2x 100G Mellanox ConnectX-4 (connected to x16 pcie 8GT)
>
>
> Server software:
>
> FRR - as routing daemon
>
> enp175s0f0 (100G) - 16 vlans from upstreams (28 RSS binded to local numa node)
>
> enp175s0f1 (100G) - 343 vlans to clients (28 RSS binded to local numa node)
>
>
> Maximum traffic that server can handle:
>
> Bandwidth
>
> bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
> input: /proc/net/dev type: rate
> \ iface Rx Tx Total
> ==============================================================================
> enp175s0f1: 28.51 Gb/s 37.24 Gb/s 65.74 Gb/s
> enp175s0f0: 38.07 Gb/s 28.44 Gb/s 66.51 Gb/s
> ------------------------------------------------------------------------------
> total: 66.58 Gb/s 65.67 Gb/s 132.25 Gb/s
>
>
> Packets per second:
>
> bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
> input: /proc/net/dev type: rate
> - iface Rx Tx Total
> ==============================================================================
> enp175s0f1: 5248589.00 P/s 3486617.75 P/s 8735207.00 P/s
> enp175s0f0: 3557944.25 P/s 5232516.00 P/s 8790460.00 P/s
> ------------------------------------------------------------------------------
> total: 8806533.00 P/s 8719134.00 P/s 17525668.00 P/s
>
>
> After reaching that limits nics on the upstream side (more RX traffic) start to drop packets
>
>
> I just dont understand that server can't handle more bandwidth (~40Gbit/s is limit where all cpu's are 100% util) - where pps on RX side are increasing.
>
> Was thinking that maybee reached some pcie x16 limit - but x16 8GT is 126Gbit - and also when testing with pktgen i can reach more bw and pps (like 4x more comparing to normal internet traffic)
>
> And wondering if there is something that can be improved here.
>
>
>
> Some more informations / counters / stats and perf top below:
>
> Perf top flame graph:
>
> https://uploadfiles.io/7zo6u
>
>
>
> System configuration(long):
>
>
> cat /sys/devices/system/node/node1/cpulist
> 14-27,42-55
> cat /sys/class/net/enp175s0f0/device/numa_node
> 1
> cat /sys/class/net/enp175s0f1/device/numa_node
> 1
>
>
>
>
>
> ip -s -d link ls dev enp175s0f0
> 6: enp175s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 8192
> link/ether 0c:c4:7a:d8:5d:1c brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 448 numrxqueues 56 gso_max_size 65536 gso_max_segs 65535
> RX: bytes packets errors dropped overrun mcast
> 184142375840858 141347715974 2 2806325 0 85050528
> TX: bytes packets errors dropped carrier collsns
> 99270697277430 172227994003 0 0 0 0
>
> ip -s -d link ls dev enp175s0f1
> 7: enp175s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 8192
> link/ether 0c:c4:7a:d8:5d:1d brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 448 numrxqueues 56 gso_max_size 65536 gso_max_segs 65535
> RX: bytes packets errors dropped overrun mcast
> 99686284170801 173507590134 61 669685 0 100304421
> TX: bytes packets errors dropped carrier collsns
> 184435107970545 142383178304 0 0 0 0
>
>
> ./softnet.sh
> cpu total dropped squeezed collision rps flow_limit
>
>
>
>
> PerfTop: 108490 irqs/sec kernel:99.6% exact: 0.0% [4000Hz cycles], (all, 56 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 26.78% [kernel] [k] queued_spin_lock_slowpath
This is highly suspect.
A call graph (perf record -a -g sleep 1; perf report --stdio) would tell what is going on.
With that many TX/RX queues, I would expect you to not use RPS/RFS, and have a 1/1 RX/TX mapping,
so I do not know what could request a spinlock contention.
Powered by blists - more mailing lists