netdev - Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e10bf68-f3b3-98f2-91a5-25b151756dd6@itcare.pl>
Date:   Wed, 31 Oct 2018 23:20:01 +0100
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
 traffic



W dniu 31.10.2018 o 23:09, Eric Dumazet pisze:
>
> On 10/31/2018 02:57 PM, Paweł Staszewski wrote:
>> Hi
>>
>> So maybee someone will be interested how linux kernel handles normal traffic (not pktgen :) )
>>
>>
>> Server HW configuration:
>>
>> CPU : Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
>>
>> NIC's: 2x 100G Mellanox ConnectX-4 (connected to x16 pcie 8GT)
>>
>>
>> Server software:
>>
>> FRR - as routing daemon
>>
>> enp175s0f0 (100G) - 16 vlans from upstreams (28 RSS binded to local numa node)
>>
>> enp175s0f1 (100G) - 343 vlans to clients (28 RSS binded to local numa node)
>>
>>
>> Maximum traffic that server can handle:
>>
>> Bandwidth
>>
>>   bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
>>    input: /proc/net/dev type: rate
>>    \         iface                   Rx Tx                Total
>> ==============================================================================
>>         enp175s0f1:          28.51 Gb/s           37.24 Gb/s           65.74 Gb/s
>>         enp175s0f0:          38.07 Gb/s           28.44 Gb/s           66.51 Gb/s
>> ------------------------------------------------------------------------------
>>              total:          66.58 Gb/s           65.67 Gb/s          132.25 Gb/s
>>
>>
>> Packets per second:
>>
>>   bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
>>    input: /proc/net/dev type: rate
>>    -         iface                   Rx Tx                Total
>> ==============================================================================
>>         enp175s0f1:      5248589.00 P/s       3486617.75 P/s 8735207.00 P/s
>>         enp175s0f0:      3557944.25 P/s       5232516.00 P/s 8790460.00 P/s
>> ------------------------------------------------------------------------------
>>              total:      8806533.00 P/s       8719134.00 P/s 17525668.00 P/s
>>
>>
>> After reaching that limits nics on the upstream side (more RX traffic) start to drop packets
>>
>>
>> I just dont understand that server can't handle more bandwidth (~40Gbit/s is limit where all cpu's are 100% util) - where pps on RX side are increasing.
>>
>> Was thinking that maybee reached some pcie x16 limit - but x16 8GT is 126Gbit - and also when testing with pktgen i can reach more bw and pps (like 4x more comparing to normal internet traffic)
>>
>> And wondering if there is something that can be improved here.
>>
>>
>>
>> Some more informations / counters / stats and perf top below:
>>
>> Perf top flame graph:
>>
>> https://uploadfiles.io/7zo6u
>>
>>
>>
>> System configuration(long):
>>
>>
>> cat /sys/devices/system/node/node1/cpulist
>> 14-27,42-55
>> cat /sys/class/net/enp175s0f0/device/numa_node
>> 1
>> cat /sys/class/net/enp175s0f1/device/numa_node
>> 1
>>
>>
>>
>>
>>
>> ip -s -d link ls dev enp175s0f0
>> 6: enp175s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 8192
>>      link/ether 0c:c4:7a:d8:5d:1c brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 448 numrxqueues 56 gso_max_size 65536 gso_max_segs 65535
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      184142375840858 141347715974 2       2806325 0       85050528
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      99270697277430 172227994003 0       0       0       0
>>
>>   ip -s -d link ls dev enp175s0f1
>> 7: enp175s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 8192
>>      link/ether 0c:c4:7a:d8:5d:1d brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 448 numrxqueues 56 gso_max_size 65536 gso_max_segs 65535
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      99686284170801 173507590134 61      669685  0       100304421
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      184435107970545 142383178304 0       0       0       0
>>
>>
>> ./softnet.sh
>> cpu      total    dropped   squeezed  collision        rps flow_limit
>>
>>
>>
>>
>>     PerfTop:  108490 irqs/sec  kernel:99.6%  exact:  0.0% [4000Hz cycles],  (all, 56 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>      26.78%  [kernel]       [k] queued_spin_lock_slowpath
> This is highly suspect.
>
> A call graph (perf record -a -g sleep 1; perf report --stdio) would tell what is going on.
perf report:
https://ufile.io/rqp0h



>
> With that many TX/RX queues, I would expect you to not use RPS/RFS, and have a 1/1 RX/TX mapping,
> so I do not know what could request a spinlock contention.
>
>
>