lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce33b7b3-ae00-b6f0-e82a-6df3d5a5e995@itcare.pl>
Date:   Sun, 13 Aug 2017 18:58:58 +0200
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding
 performance vs Core/RSS number / HT on

To show some difference below comparision vlan/no-vlan traffic

10Mpps forwarded traffic vith no-vlan vs 6.9Mpps with vlan

(ixgbe in kernel driver kernel 4.13.0-rc4-next-20170811)

ethtool settings for both tests:

ethtool -K $ifc gro off tso off gso off sg on l2-fwd-offload off 
tx-nocache-copy off ntuple off

ethtool -L $ifc combined 16

ethtool -C $ifc rx-usecs 2

ethtool -G $ifc rx 4096 tx 1024

16 CORES / 16 RSS QUEUES


Tx traffic on vlan:

RX Interface:

enp216s0f0

TX Interface

vlan1000 added to enp216s0f1 interface (with vlan 1000 ip address assigned)

ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
0;16;64;6939008;416325120;6938696;402411192
1;16;64;6941952;416444160;6941745;402558918
2;16;64;6960576;417584640;6960707;403698718
3;16;64;6940736;416486400;6941820;402503876
4;16;64;6927680;415741440;6927420;401853870
5;16;64;6929792;415687680;6929917;401839196
6;16;64;6950400;416989440;6950661;403026166
7;16;64;6953664;417216000;6953454;403260544
8;16;64;6948480;416851200;6948800;403023266
9;16;64;6924160;415422720;6924092;401542468

100% load on all 16 Cores.


vs

RX interface from traffic generator:

enp216s0f0

TX interface to the sink:

enp216s0f1

No vlan used

ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX

0;16;64;10280176;793608540;10298496;596796568
1;16;64;10046928;600978780;10046022;582527002
2;16;64;10032956;601827420;10026097;581515656
3;16;64;10051503;602252460;10067880;582420804
4;16;64;10016204;602725140;10017358;582644800
5;16;64;10035575;602437620;10059504;582067294
6;16;64;10041667;603069780;10057865;582477412
7;16;64;10044448;600027420;10046526;581022018
8;16;64;10022436;601121100;10025946;581904314
9;16;64;10036231;602514960;10058724;582180684


So we have 10Mpps forwarded

- have problems with pktgen on my traffic generator to push more than 
10M but this low budget hardware so.. :)


And there are still free cpu cycles so probabbly can forward at line 10G 
rate 14Mpps

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  
%guest  %gnice   %idle
Average:     all    0.00    0.00    0.00    0.00    0.00 20.91    
0.00    0.00    0.00   79.09
Average:       0    0.00    0.00    0.00    0.00    0.00 0.09    0.00    
0.00    0.00   99.91
Average:       1    0.03    0.00    0.03    0.00    0.00 0.00    0.00    
0.00    0.00   99.94
Average:       2    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:       3    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:       4    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:       5    0.00    0.00    0.18    0.00    0.00 0.00    0.00    
0.00    0.00   99.82
Average:       6    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:       7    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:       8    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:       9    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      10    0.00    0.00    0.03    0.24    0.00 0.00    0.00    
0.00    0.00   99.74
Average:      11    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      12    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      13    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      14    0.00    0.00    0.00    0.00    0.00 92.38    
0.00    0.00    0.00    7.62
Average:      15    0.00    0.00    0.00    0.00    0.00 85.88    
0.00    0.00    0.00   14.12
Average:      16    0.00    0.00    0.00    0.00    0.00 64.91    
0.00    0.00    0.00   35.09
Average:      17    0.00    0.00    0.00    0.00    0.00 66.76    
0.00    0.00    0.00   33.24
Average:      18    0.00    0.00    0.00    0.00    0.00 65.57    
0.00    0.00    0.00   34.43
Average:      19    0.00    0.00    0.00    0.00    0.00 66.38    
0.00    0.00    0.00   33.62
Average:      20    0.00    0.00    0.00    0.00    0.00 72.97    
0.00    0.00    0.00   27.03
Average:      21    0.00    0.00    0.00    0.00    0.00 70.80    
0.00    0.00    0.00   29.20
Average:      22    0.00    0.00    0.00    0.00    0.00 66.44    
0.00    0.00    0.00   33.56
Average:      23    0.00    0.00    0.00    0.00    0.00 66.12    
0.00    0.00    0.00   33.88
Average:      24    0.00    0.00    0.00    0.00    0.00 68.35    
0.00    0.00    0.00   31.65
Average:      25    0.00    0.00    0.00    0.00    0.00 71.79    
0.00    0.00    0.00   28.21
Average:      26    0.00    0.00    0.00    0.00    0.00 70.24    
0.00    0.00    0.00   29.76
Average:      27    0.00    0.00    0.00    0.00    0.00 73.24    
0.00    0.00    0.00   26.76
Average:      28    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      29    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      30    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      31    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      32    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      33    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      34    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      35    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      36    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      37    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      38    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      39    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      40    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      41    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      42    0.00    0.00    0.00    0.00    0.00 84.27    
0.00    0.00    0.00   15.73
Average:      43    0.00    0.00    0.00    0.00    0.00 84.50    
0.00    0.00    0.00   15.50
Average:      44    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      45    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      46    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      47    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      48    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      49    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      50    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      51    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      52    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      53    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      54    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00
Average:      55    0.00    0.00    0.00    0.00    0.00 0.00    0.00    
0.00    0.00  100.00

Average:     CPU    intr/s
Average:     all 3559661.68
Average:       0    628.53
Average:       1    537.62
Average:       2    525.00
Average:       3    558.29
Average:       4    546.79
Average:       5    522.85
Average:       6    508.06
Average:       7    568.88
Average:       8    529.56
Average:       9    535.29
Average:      10    530.09
Average:      11    539.53
Average:      12    520.82
Average:      13    531.32
Average:      14  73315.68
Average:      15 115983.15
Average:      16 254446.09
Average:      17 253067.79
Average:      18 254446.35
Average:      19 252457.29
Average:      20 213928.18
Average:      21 232770.32
Average:      22 263906.85
Average:      23 260065.09
Average:      24 243609.74
Average:      25 218122.53
Average:      26 237405.38
Average:      27 217582.76
Average:      28    548.29
Average:      29    569.12
Average:      30    540.74
Average:      31    517.50
Average:      32    521.59
Average:      33    544.85
Average:      34    520.91
Average:      35    553.29
Average:      36    545.32
Average:      37    518.44
Average:      38    557.26
Average:      39    541.71
Average:      40    515.21
Average:      41    520.82
Average:      42 137722.38
Average:      43 135737.59
Average:      44    524.97
Average:      45    538.24
Average:      46    580.38
Average:      47    567.62
Average:      48    555.53
Average:      49    561.50
Average:      50    537.65
Average:      51    565.09
Average:      52    536.12
Average:      53    570.44
Average:      54    535.38
Average:      55    567.88

Average:     CPU       HI/s    TIMER/s   NET_TX/s   NET_RX/s BLOCK/s 
IRQ_POLL/s  TASKLET/s    SCHED/s  HRTIMER/s      RCU/s
Average:       0       0.00     250.03       0.18      69.09 0.00       
0.00       6.59     250.03       0.00      52.62
Average:       1       0.00     228.94       0.00       0.00 0.00       
0.00      10.82     249.79       0.00      48.06
Average:       2       0.00     244.53       0.00       0.00 0.00       
0.00       0.00     249.79       0.00      30.68
Average:       3       0.00     249.53       0.00       0.00 0.00       
0.00       0.00     249.91       0.00      58.85
Average:       4       0.00     243.71       0.00       0.00 0.00       
0.00       0.00     249.94       0.00      53.15
Average:       5       0.00     247.06       0.00       0.00 0.00       
0.00       0.00     249.44       0.00      26.35
Average:       6       0.00     249.38       0.00       0.00 0.00       
0.00       0.00     249.91       0.00       8.76
Average:       7       0.00     226.03       0.00       0.00 0.00       
0.00       0.00     249.88       0.00      92.97
Average:       8       0.00     244.29       0.00       0.00 0.00       
0.00       0.00     249.79       0.00      35.47
Average:       9       0.00     247.71       0.00       0.00 0.00       
0.00       0.00     249.94       0.00      37.65
Average:      10       0.00     241.85       0.00       0.00 0.00       
0.00       0.00     249.65       0.00      38.59
Average:      11       0.00     249.00       0.00       0.00 0.00       
0.00       0.00     249.94       0.00      40.59
Average:      12       0.00     248.85       0.00       0.00 0.00       
0.00       0.00     249.88       0.00      22.09
Average:      13       0.00     249.12       0.00       0.00 0.00       
0.00       0.00     249.88       0.00      32.32
Average:      14       0.00     212.97     147.44   72760.47 0.00       
0.00       0.00     189.35       0.00       5.44
Average:      15       0.00     233.94     139.21  115367.26 0.00       
0.00       0.00     225.03       0.00      17.71
Average:      16       0.00     245.41     158.94  253784.06 0.00       
0.00       0.00     244.65       0.00      13.03
Average:      17       0.00     245.74     164.09  252402.41 0.00       
0.00       0.00     244.85       0.00      10.71
Average:      18       0.00     245.56     161.32  253778.00 0.00       
0.00       0.00     244.97       0.00      16.50
Average:      19       0.00     245.38     161.35  251789.41 0.00       
0.00       0.00     244.53       0.00      16.62
Average:      20       0.00     244.97     201.59  213226.74 0.00       
0.00       0.00     244.24       0.00      10.65
Average:      21       0.00     245.59     174.82  232072.85 0.00       
0.00       0.00     244.82       0.00      32.24
Average:      22       0.00     245.44     157.47  263244.59 0.00       
0.00       0.00     244.74       0.00      14.62
Average:      23       0.00     245.53     165.09  259398.41 0.00       
0.00       0.00     244.94       0.00      11.12
Average:      24       0.00     245.38     179.47  242922.85 0.00       
0.00       0.00     244.85       0.00      17.18
Average:      25       0.00     245.47     195.15  217419.79 0.00       
0.00       0.00     244.50       0.00      17.62
Average:      26       0.00     245.41     182.18  236714.38 0.00       
0.00       0.00     244.62       0.00      18.79
Average:      27       0.00     244.94     196.65  216886.68 0.00       
0.00       0.00     244.44       0.00      10.06
Average:      28       0.00     247.35       0.00       0.00 0.00       
0.00       0.00     249.91       0.00      51.03
Average:      29       0.00     228.94       0.00       0.00 0.00       
0.00       0.00     249.79       0.00      90.38
Average:      30       0.00     244.50       0.00       0.00 0.00       
0.00       0.00     249.82       0.00      46.41
Average:      31       0.00     249.53       0.00       0.00 0.00       
0.00       0.00     249.97       0.00      18.00
Average:      32       0.00     243.82       0.00       0.00 0.00       
0.00       0.00     249.97       0.00      27.79
Average:      33       0.00     247.03       0.00       0.00 0.00       
0.00       0.00     249.47       0.00      48.35
Average:      34       0.00     249.38       0.00       0.00 0.00       
0.00       0.00     249.94       0.00      21.59
Average:      35       0.00     226.00       0.00       0.00 0.00       
0.00       0.00     249.88       0.00      77.41
Average:      36       0.00     244.29       0.00       0.00 0.00       
0.00       0.00     249.85       0.00      51.18
Average:      37       0.00     247.71       0.00       0.00 0.00       
0.00       0.00     249.97       0.00      20.76
Average:      38       0.00     241.85       0.00       0.00 0.00       
0.00       0.00     249.68       0.00      65.74
Average:      39       0.00     249.00       0.00       0.00 0.00       
0.00       0.00     249.94       0.00      42.76
Average:      40       0.00     248.85       0.00       0.00 0.00       
0.00       0.00     249.85       0.00      16.50
Average:      41       0.00     249.12       0.00       0.00 0.00       
0.00       0.00     249.94       0.00      21.76
Average:      42       0.00     240.26     112.85  137160.06 0.00       
0.00       0.00     203.44       0.00       5.76
Average:      43       0.00     237.91     122.91  135139.38 0.00       
0.00       0.00     231.79       0.00       5.59
Average:      44       0.00     249.15       0.00       0.00 0.00       
0.00       0.00     248.26       0.00      27.56
Average:      45       0.00     249.26       0.00       0.00 0.00       
0.00       0.00     248.62       0.00      40.35
Average:      46       0.00     249.26       0.00       0.00 0.00       
0.00       0.00     248.47       0.00      82.65
Average:      47       0.00     249.26       0.00       0.00 0.00       
0.00       0.00     248.50       0.00      69.85
Average:      48       0.00     249.09       0.00       0.00 0.00       
0.00       0.00     248.38       0.00      58.06
Average:      49       0.00     249.32       0.00       0.00 0.00       
0.00       0.00     248.59       0.00      63.59
Average:      50       0.00     249.35       0.00       0.00 0.00       
0.00       0.00     248.32       0.00      39.97
Average:      51       0.00     249.24       0.00       0.00 0.00       
0.00       0.00     248.29       0.00      67.56
Average:      52       0.00     249.18       0.00       0.00 0.00       
0.00       0.00     247.91       0.00      39.03
Average:      53       0.00     249.18       0.00       0.00 0.00       
0.00       0.00     248.41       0.00      72.85
Average:      54       0.00     249.18       0.00       0.00 0.00       
0.00       0.00     248.06       0.00      38.15
Average:      55       0.00     249.41       0.18       0.00 0.00       
0.00       0.00     248.71       0.00      69.59



W dniu 2017-08-12 o 19:27, Paweł Staszewski pisze:
> Hi and thanks for reply
>
>
>
> W dniu 2017-08-12 o 14:23, Jesper Dangaard Brouer pisze:
>> On Fri, 11 Aug 2017 19:51:10 +0200 Paweł Staszewski 
>> <pstaszewski@...are.pl> wrote:
>>
>>> Hi
>>>
>>> I made some tests for performance comparison.
>> Thanks for doing this. Feel free to Cc me, if you do more of these
>> tests (so I don't miss them on the mailing list).
>>
>> I don't understand stand if you are reporting a potential problem?
>>
>> It would be good if you can provide a short summary section (of the
>> issue) in the _start_ of the email, and then provide all this nice data
>> afterwards, to back your case.
>>
>> My understanding is, you report:
>>
>> 1. VLANs on ixgbe show a 30-40% slowdown
>> 2. System stopped scaling after 7+ CPUs
> This is not only problem/bug report  - but some kind of comparision 
> plus some toughts about possible problems :)
> And can help somebody when searching the net for possible expectations :)
> Also - dono better list where are the smartest people that know what 
> is going in kernel with networking :)
>
> Next time i will place summary on top - sorry :)
>
>>
>>> Tested HW (FORWARDING HOST):
>>>
>>> Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
>> Interesting, I've not heard about a Intel CPU called "Gold" before now,
>> but it does exist:
>> https://ark.intel.com/products/123541/Intel-Xeon-Gold-6132-Processor-19_25M-Cache-2_60-GHz
>>
>>
>>> Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection 
>>> (rev 01)
>> This is one of my all time favorite NICs!
> Yes this is a good NIC - will have connectx-4 2x100G by monday so will 
> also do some tests
>
>>> Test diagram:
>>>
>>>
>>> TRAFFIC GENERATOR (ethX) -> (enp216s0f0 - RX Traffic) FORWARDING HOST
>>> (enp216s0f1(vlan1000) - TX Traffic) -> (ethY) SINK
>>>
>>> Forwarder traffic: UDP random ports from 9 to 19 with random hosts from
>>> 172.16.0.1 to 172.16.0.255
>>>
>>> TRAFFIC GENERATOR TX is stable 9.9Mpps (in kernel pktgen)
>> What kind of traffic flow?  E.g. distribution, many/few source IPs...
>
> Traffic generator is pktgen so udp flows - better paste parameters 
> from pktgen:
>     UDP_MIN=9
>     UDP_MAX=19
>
>     pg_set $dev "dst_min 172.16.0.1"
>     pg_set $dev "dst_max 172.16.0.100"
>
>     # Setup random UDP port src range
>     #pg_set $dev "flag UDPSRC_RND"
>     pg_set $dev "flag UDPSRC_RND"
>     pg_set $dev "udp_src_min $UDP_MIN"
>     pg_set $dev "udp_src_max $UDP_MAX"
>
>
>>
>>> Settings used for FORWARDING HOST (changed param. was only number of 
>>> RSS
>>> combined queues + set affinity assignment for them to fit with first
>>> numa node where 2x10G port card is installed)
>>>
>>> ixgbe driver used from kernel (in-kernel build - not a module)
>>>
>> Nice with a script showing you setup, thanks. I would be good if it had
>> comments, telling why you think this is a needed setup adjustment.
>>
>>> #!/bin/sh
>>> ifc='enp216s0f0 enp216s0f1'
>>> for i in $ifc
>>>           do
>>>           ip link set up dev $i
>>>           ethtool -A $i autoneg off rx off tx off
>> Good:
>>   Turning off Ethernet flow control, to avoid receiver being the
>>   bottleneck via pause-frames.
> Yes - enabled flow controll is really bad :)
>>>           ethtool -G $i rx 4096 tx 1024
>> You adjust the RX and TX ring queue sizes, this have effects that you
>> don't realize.  Especially for the ixgbe driver, which have a page
>> recycle trick tied to the RX ring queue size.
> rx ring 4096 and tx ring 1024
> - this is because have best performance then with average packet size 
> from 64 to 1500 bytes
>
> Can be a little better performance for smaller frames like 64 - with 
> rx ring set to 1024
> below 1 core/1 RSS queue with rx ring set to 1024
>
> 0;1;64;1530112;91772160;1529919;88724208
> 1;1;64;1531584;91872000;1531520;88813196
> 2;1;64;1531392;91895040;1531262;88831930
> 3;1;64;1530880;91875840;1531201;88783558
> 4;1;64;1530688;91829760;1530688;88768826
> 5;1;64;1530432;91810560;1530624;88764940
> 6;1;64;1530880;91868160;1530878;88787328
> 7;1;64;1530496;91845120;1530560;88765114
> 8;1;64;1530496;91837440;1530687;88772538
> 9;1;64;1530176;91795200;1530496;88735360
>
> so from 1.47Mpps to 1.53Mpps
>
> But with bigger packets > 200 performance is better when rx is set to 
> 4096
>
>
>>
>>>           ip link set $i txqueuelen 1000
>> Setting tx queue len to the default 1000 seems redundant.
> Yes cause i'm changing this parameter also to see if any impact on 
> performance we have
>>
>>>           ethtool -C $i rx-usecs 10
>> Adjusting this also have effects you might not realize.  This actually
>> also affect the page recycle scheme of ixgbe.  And can sometimes be
>> used to solve stalling on DMA TX completions, which could be you issue
>> here.
> same here - rx-usecs - setting to 10 was kind of compromise to have 
> good performance with big ans small packet sizes
>
> Same test as above with rx ring 1024 tx ring 1024 and rxusecs set to 
> 256 (1Core/1RSS queue):
> 0;1;64;1506304;90424320;1506626;87402868
> 1;1;64;1505536;90343680;1504830;87321088
> 2;1;64;1506880;90416640;1507522;87388120
> 3;1;64;1511040;90700800;1511682;87684864
> 4;1;64;1511040;90681600;1511102;87662476
> 5;1;64;1511488;90712320;1511614;87673728
> 6;1;64;1511296;90700800;1511038;87669900
> 7;1;64;1513344;90773760;1513280;87751680
> 8;1;64;1513536;90850560;1513470;87807360
> 9;1;64;1512128;90696960;1512000;87696000
>
> And rx-usecs set to 1
> 0;1;64;1533632;92037120;1533504;88954368
> 1;1;64;1533632;92006400;1533570;88943348
> 2;1;64;1533504;91994880;1533504;88931980
> 3;1;64;1532864;91979520;1532674;88902516
> 4;1;64;1533952;92044800;1534080;88961792
> 5;1;64;1533888;92048640;1534270;88969100
> 6;1;64;1533952;92037120;1534082;88969216
> 7;1;64;1533952;92021760;1534208;88969332
> 8;1;64;1533056;91983360;1532930;88883724
> 9;1;64;1533760;92021760;1533886;88946828
>
> rx-useck set to 2
> 0;1;64;1522432;91334400;1522304;88301056
> 1;1;64;1521920;91330560;1522496;88286208
> 2;1;64;1522496;91322880;1522432;88304768
> 3;1;64;1523456;91422720;1523649;88382762
> 4;1;64;1527680;91676160;1527424;88601728
> 5;1;64;1527104;91626240;1526912;88572032
> 6;1;64;1527424;91641600;1527424;88590592
> 7;1;64;1526336;91572480;1526912;88523776
> 8;1;64;1527040;91637760;1526912;88579456
> 9;1;64;1527040;91595520;1526784;88553472
>
> rx-usecs set to 3
> 0;1;64;1526272;91549440;1526592;88527488
> 1;1;64;1526528;91560960;1526272;88516352
> 2;1;64;1525952;91580160;1525888;88527488
> 3;1;64;1525504;91511040;1524864;88456960
> 4;1;64;1526272;91568640;1526208;88494080
> 5;1;64;1525568;91545600;1525312;88494080
> 6;1;64;1526144;91584000;1526080;88512640
> 7;1;64;1525376;91530240;1525376;88482944
> 8;1;64;1526784;91607040;1526592;88549760
> 9;1;64;1526208;91560960;1526528;88512640
>
>
>>
>>>           ethtool -L $i combined 16
>>>           ethtool -K $i gro on tso on gso off sg on l2-fwd-offload off
>>> tx-nocache-copy on ntuple on
>> Here are many setting above.
> Yes mostly NIC defaults besides the ntuple that is on (for testing 
> some nfc drop filters - and trying to test also tc-offload )
>
>> GRO/GSO/TSO for _forwarding_ is actually bad... in my tests, enabling
>> this result in approx 10% slowdown.
> Ok lets give a try :)
> gro off tso off gso off sg on l2-fwd-offload off tx-nocache-copy on 
> ntuple on
> rx-usecs 10
> 1 CPU / 1 RSS QUEUE
>
> 0;1;64;1609344;96537600;1609279;93327104
> 1;1;64;1608320;96514560;1608256;93293812
> 2;1;64;1608000;96487680;1608125;93267770
> 3;1;64;1608320;96522240;1608576;93297524
> 4;1;64;1605888;96387840;1606211;93148986
> 5;1;64;1601472;96072960;1601600;92870644
> 6;1;64;1602624;96180480;1602243;92959674
> 7;1;64;1601728;96107520;1602113;92907764
> 8;1;64;1602176;96122880;1602176;92933806
> 9;1;64;1603904;96253440;1603777;93045208
>
> A little better performance 1.6Mpps
> But wondering if disabling tso will have no performance impact for tcp 
> traffic ...
> Will try to get some pktgen like pktgen-dpdk that can generate also 
> tcp traffic - to compare this.
>
>
>>
>> AFAIK "tx-nocache-copy on" was also determined to be a bad option.
> I set this to on cause i have better performance (a little 10kpps for 
> this test)
> below same test as above  with tx-nocache-copy off
>
> 0;1;64;1591552;95496960;1591230;92313654
> 1;1;64;1596224;95738880;1595842;92555066
> 2;1;64;1595456;95700480;1595201;92521774
> 3;1;64;1595456;95723520;1595072;92528966
> 4;1;64;1595136;95692800;1595457;92503040
> 5;1;64;1594624;95631360;1594496;92473402
> 6;1;64;1596224;95761920;1595778;92551180
> 7;1;64;1595200;95700480;1595331;92521542
> 8;1;64;1595584;95692800;1595457;92521426
> 9;1;64;1594624;95662080;1594048;92469574
>
>
>
>
>>
>> The "ntuple on" AFAIK disables the flow-director in the NIC.  I though
>> this would actually help VLAN traffic, but I guess not.
> yes I enabled this cause was thinking that can help with traffic on vlans
>
> below same test with ntuple off
> so all settings for ixgbe:
> gro off tso off gso off sg on l2-fwd-offload off tx-nocache-copy off 
> ntuple off
> rx-usecs 10
> rx-flow-hash udp4 sdfn
>
> 0;1;64;1611840;96691200;1611905;93460794
> 1;1;64;1610688;96645120;1610818;93427328
> 2;1;64;1610752;96668160;1610497;93442176
> 3;1;64;1610624;96664320;1610817;93427212
> 4;1;64;1610752;96652800;1610623;93412480
> 5;1;64;1610048;96614400;1610112;93404940
> 6;1;64;1611264;96641280;1611390;93427212
> 7;1;64;1611008;96691200;1610942;93468160
> 8;1;64;1610048;96652800;1609984;93408652
> 9;1;64;1611136;96641280;1610690;93434636
>
> Performance is a little better
> and now with tx-nocache-copy on
>
> 0;1;64;1597248;95834880;1597311;92644096
> 1;1;64;1597888;95865600;1597824;92677446
> 2;1;64;1597952;95834880;1597822;92644038
> 3;1;64;1597568;95877120;1597375;92685044
> 4;1;64;1597184;95827200;1597314;92629190
> 5;1;64;1597696;95842560;1597565;92625652
> 6;1;64;1597312;95834880;1597376;92644038
> 7;1;64;1597568;95873280;1597634;92647924
> 8;1;64;1598400;95919360;1598849;92699602
> 9;1;64;1597824;95873280;1598208;92684928
>
>
> That is weird - so enabling tx-nocache-copy with disabled ntuple have 
> bad performance impact - but with enabled ntuple there is no 
> performance impact
>
>
>
>>
>>
>>>           ethtool -N $i rx-flow-hash udp4 sdfn
>> Why do you change the NICs flow-hash?
> whan used 16 cores / 16 rss queues - there was better load 
> distribution over all cores when sdfn rx-flow-hash enabled
>
>>
>>>           done
>>>
>>> ip link set up dev enp216s0f0
>>> ip link set up dev enp216s0f1
>>>
>>> ip a a 10.0.0.1/30 dev enp216s0f0
>>>
>>> ip link add link enp216s0f1 name vlan1000 type vlan id 1000
>>> ip link set up dev vlan1000
>>> ip a a 10.0.0.5/30 dev vlan1000
>>>
>>>
>>> ip route add 172.16.0.0/12 via 10.0.0.6
>>>
>>> ./set_irq_affinity.sh -x 14-27,42-43 enp216s0f0
>>> ./set_irq_affinity.sh -x 14-27,42-43 enp216s0f1
>>> #cat  /sys/devices/system/node/node1/cpulist
>>> #14-27,42-55
>>> #cat  /sys/devices/system/node/node0/cpulist
>>> #0-13,28-41
>> Is this a NUMA system?
> This is 2x CPU 6132 - so have two separate pcie access to the nic - 
> need to check what cpu is assigned to pcie where network card is 
> connected to have network card on local cpu where all irq's are binded
>
>>
>>> #################################################
>>>
>>>
>>> Looks like forwarding performance when using vlans on ixgbe is less 
>>> that
>>> without vlans for about 30-40% (wondering if this is some vlan
>>> offloading problem and ixgbe)
>> I would see this as a problem/bug that enabling VLANs cost this much.
> Yes - was thinking that with tx/rx vlan offloading there will be not 
> much performance impact when vlans used.
>
>>> settings below:
>>>
>>> ethtool -k enp216s0f0
>>> Features for enp216s0f0:
>>> Cannot get device udp-fragmentation-offload settings: Operation not
>>> supported
>>> rx-checksumming: on
>>> tx-checksumming: on
>>>           tx-checksum-ipv4: off [fixed]
>>>           tx-checksum-ip-generic: on
>>>           tx-checksum-ipv6: off [fixed]
>>>           tx-checksum-fcoe-crc: off [fixed]
>>>           tx-checksum-sctp: on
>>> scatter-gather: on
>>>           tx-scatter-gather: on
>>>           tx-scatter-gather-fraglist: off [fixed]
>>> tcp-segmentation-offload: on
>>>           tx-tcp-segmentation: on
>>>           tx-tcp-ecn-segmentation: off [fixed]
>>>           tx-tcp-mangleid-segmentation: on
>>>           tx-tcp6-segmentation: on
>>> udp-fragmentation-offload: off
>>> generic-segmentation-offload: off
>>> generic-receive-offload: on
>>> large-receive-offload: off
>>> rx-vlan-offload: on
>>> tx-vlan-offload: on
>>> ntuple-filters: on
>>> receive-hashing: on
>>> highdma: on [fixed]
>>> rx-vlan-filter: on
>>> vlan-challenged: off [fixed]
>>> tx-lockless: off [fixed]
>>> netns-local: off [fixed]
>>> tx-gso-robust: off [fixed]
>>> tx-fcoe-segmentation: off [fixed]
>>> tx-gre-segmentation: on
>>> tx-gre-csum-segmentation: on
>>> tx-ipxip4-segmentation: on
>>> tx-ipxip6-segmentation: on
>>> tx-udp_tnl-segmentation: on
>>> tx-udp_tnl-csum-segmentation: on
>>> tx-gso-partial: on
>>> tx-sctp-segmentation: off [fixed]
>>> tx-esp-segmentation: off [fixed]
>>> fcoe-mtu: off [fixed]
>>> tx-nocache-copy: on
>>> loopback: off [fixed]
>>> rx-fcs: off [fixed]
>>> rx-all: off
>>> tx-vlan-stag-hw-insert: off [fixed]
>>> rx-vlan-stag-hw-parse: off [fixed]
>>> rx-vlan-stag-filter: off [fixed]
>>> l2-fwd-offload: off
>>> hw-tc-offload: off
>>> esp-hw-offload: off [fixed]
>>> esp-tx-csum-hw-offload: off [fixed]
>>> rx-udp_tunnel-port-offload: on
>>>
>>>
>>> Another thing is that forwarding performance does not scale with number
>>> of cores when 7+ cores are reached
>> I've seen problems with using Hyper-Threading CPUs.  Could it be that
>> above 7 CPUs you are starting to use sibling-cores ?
> Turbostats can help here:
> Package Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ SMI     
> C1      C2      C1%     C2%     CPU%c1  CPU%c6  CoreTmp PkgTmp  
> PkgWatt RAMWatt PKG_%   RAM_%
> -       -       -       72      2.27    3188    2600    194844 0       
> 64      69282   0.07    97.83   18.38   79.36   -4 54 123.49  16.08   
> 0.00    0.00
> 0       0       0       8       0.74    1028    2600    1513 0       
> 32      1462    1.50    97.99   10.92   88.34   47 51 58.34   5.34    
> 0.00    0.00
> 0       0       28      7       0.67    1015    2600    1255 0       
> 12      1249    0.96    98.61   10.99
> 0       1       1       7       0.68    1019    2600    1260 0       
> 0       1260    0.00    99.54   8.44    90.88   49
> 0       1       29      9       0.71    1208    2600    1252 0       
> 0       1253    0.00    99.48   8.41
> 0       2       2       7       0.67    1019    2600    1261 0       
> 0       1260    0.00    99.54   8.44    90.89   48
> 0       2       30      7       0.67    1017    2600    1255 0       
> 0       1255    0.00    99.55   8.44
> 0       3       3       7       0.68    1019    2600    1260 0       
> 0       1259    0.00    99.53   8.46    90.86   -4
> 0       3       31      7       0.67    1017    2600    1256 0       
> 0       1256    0.00    99.55   8.46
> 0       4       4       7       0.67    1027    2600    1260 0       
> 0       1260    0.00    99.54   8.43    90.90   -4
> 0       4       32      7       0.66    1018    2600    1255 0       
> 0       1255    0.00    99.55   8.44
> 0       5       5       7       0.68    1020    2600    1260 0       
> 0       1257    0.00    99.54   8.44    90.89   50
> 0       5       33      7       0.68    1019    2600    1255 0       
> 0       1255    0.00    99.55   8.43
> 0       6       6       7       0.70    1019    2600    1260 0       
> 0       1259    0.00    99.53   8.43    90.87   -4
> 0       6       34      7       0.70    1019    2600    1255 0       
> 0       1255    0.00    99.54   8.43
> 0       8       7       7       0.68    1019    2600    1262 0       
> 0       1261    0.00    99.52   8.42    90.90   50
> 0       8       35      7       0.67    1019    2600    1255 0       
> 0       1255    0.00    99.55   8.43
> 0       9       8       7       0.68    1019    2600    1260 0       
> 0       1257    0.00    99.54   8.40    90.92   49
> 0       9       36      7       0.66    1017    2600    1255 0       
> 0       1255    0.00    99.55   8.41
> 0       10      9       7       0.66    1018    2600    1257 0       
> 0       1257    0.00    99.54   8.40    90.94   -4
> 0       10      37      7       0.66    1018    2600    1255 0       
> 0       1255    0.00    99.55   8.41
> 0       11      10      7       0.66    1019    2600    1257 0       
> 0       1259    0.00    99.54   8.56    90.77   -4
> 0       11      38      7       0.66    1018    2600    1255 0       
> 3       1252    0.19    99.36   8.57
> 0       12      11      7       0.67    1019    2600    1260 0       
> 0       1260    0.00    99.54   8.44    90.88   -4
> 0       12      39      7       0.67    1019    2600    1255 0       
> 0       1256    0.00    99.55   8.44
> 0       13      12      7       0.68    1019    2600    1257 0       
> 4       1254    0.32    99.22   8.67    90.65   -4
> 0       13      40      7       0.69    1019    2600    1256 0       
> 4       1253    0.24    99.31   8.66
> 0       14      13      7       0.71    1020    2600    1260 0       
> 0       1259    0.00    99.53   8.41    90.88   -4
> 0       14      41      7       0.72    1020    2600    1255 0       
> 0       1255    0.00    99.54   8.40
> 1       0       14      3564    99.19   3594    2600    125472 0       
> 0       0       0.00    0.00    0.81    0.00    54 54 65.15   10.74   
> 0.00    0.00
> 1       0       42      3       0.07    3701    2600    1255 0       
> 0       1255    0.00    99.95   99.93
> 1       1       15      11      0.32    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.37   73.31   42
> 1       1       43      10      0.31    3301    2600    1255 0       
> 0       1255    0.00    99.82   26.38
> 1       2       16      10      0.31    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.37   73.32   39
> 1       2       44      10      0.32    3301    2600    1255 0       
> 0       1255    0.00    99.82   26.36
> 1       3       17      10      0.32    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.40   73.28   39
> 1       3       45      11      0.32    3301    2600    1255 0       
> 0       1255    0.00    99.81   26.40
> 1       4       18      10      0.32    3301    2600    1257 0       
> 0       1257    0.00    99.82   26.40   73.28   40
> 1       4       46      11      0.32    3301    2600    1255 0       
> 0       1255    0.00    99.82   26.40
> 1       5       19      11      0.33    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.40   73.27   39
> 1       5       47      11      0.33    3300    2600    1255 0       
> 0       1255    0.00    99.82   26.40
> 1       6       20      12      0.35    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.38   73.27   42
> 1       6       48      12      0.36    3301    2600    1255 0       
> 0       1255    0.00    99.81   26.37
> 1       8       21      11      0.33    3301    2600    1257 0       
> 0       1257    0.00    99.82   26.37   73.29   42
> 1       8       49      11      0.33    3301    2600    1255 0       
> 0       1255    0.00    99.82   26.38
> 1       9       22      10      0.32    3300    2600    1257 0       
> 0       1257    0.00    99.82   26.35   73.34   41
> 1       9       50      10      0.30    3301    2600    1255 0       
> 0       1255    0.00    99.82   26.36
> 1       10      23      10      0.31    3301    2600    1257 0       
> 0       1257    0.00    99.82   26.37   73.33   41
> 1       10      51      10      0.31    3301    2600    1255 0       
> 0       1255    0.00    99.82   26.36
> 1       11      24      10      0.32    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.62   73.06   41
> 1       11      52      10      0.32    3301    2600    1255 0       
> 4       1251    0.32    99.50   26.62
> 1       12      25      11      0.33    3301    2600    1257 0       
> 0       1257    0.00    99.81   26.39   73.28   41
> 1       12      53      11      0.33    3301    2600    1258 0       
> 0       1254    0.00    99.82   26.38
> 1       13      26      12      0.36    3317    2600    1259 0       
> 0       1258    0.00    99.79   26.41   73.23   39
> 1       13      54      11      0.34    3301    2600    1255 0       
> 0       1254    0.00    99.82   26.42
> 1       14      27      12      0.36    3301    2600    1257 0       
> 5       1251    0.24    99.58   26.54   73.10   41
> 1       14      55      12      0.36    3300    2600    1255 0       
> 0       1254    0.00    99.82   26.54
>
>
> So it looks like in all tests i'm using core+sibling
> But side effect of this is that :
> 33 * 100.0 = 3300.0 MHz max turbo 28 active cores
> 33 * 100.0 = 3300.0 MHz max turbo 24 active cores
> 33 * 100.0 = 3300.0 MHz max turbo 20 active cores
> 33 * 100.0 = 3300.0 MHz max turbo 14 active cores
> 34 * 100.0 = 3400.0 MHz max turbo 12 active cores
> 34 * 100.0 = 3400.0 MHz max turbo 8 active cores
> 35 * 100.0 = 3500.0 MHz max turbo 4 active cores
> 37 * 100.0 = 3700.0 MHz max turbo 2 active cores
>
> So more cores = less MHz per core/sibling
>
>>
>>> perf top:
>>>
>>>    PerfTop:   77835 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz
>>> cycles],  (all, 56 CPUs)
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>
>>>       16.32%  [kernel]       [k] skb_dst_force
>>>       16.30%  [kernel]       [k] dst_release
>>>       15.11%  [kernel]       [k] rt_cache_valid
>>>       12.62%  [kernel]       [k] ipv4_mtu
>> It seems a little strange that these 4 functions are on the top
> Yes dono why there is ipv4_mtu called and taking soo much cycles
>
>>
>>>        5.60%  [kernel]       [k] do_raw_spin_lock
>> Why is calling/taking this lock? (Use perf call-graph recording).
> can be hard to paste it here:)
> attached file
>
>>
>>>        3.03%  [kernel]       [k] fib_table_lookup
>>>        2.70%  [kernel]       [k] ip_finish_output2
>>>        2.10%  [kernel]       [k] dev_gro_receive
>>>        1.89%  [kernel]       [k] eth_type_trans
>>>        1.81%  [kernel]       [k] ixgbe_poll
>>>        1.15%  [kernel]       [k] ixgbe_xmit_frame_ring
>>>        1.06%  [kernel]       [k] __build_skb
>>>        1.04%  [kernel]       [k] __dev_queue_xmit
>>>        0.97%  [kernel]       [k] ip_rcv
>>>        0.78%  [kernel]       [k] netif_skb_features
>>>        0.74%  [kernel]       [k] ipt_do_table
>> Unloading netfilter modules, will give more performance, but it
>> semifake to do so.
> Compiled in kernel - only in filter mode - with ipv4+ipv6 - no other 
> modules conntrack or other .
>>>        0.70%  [kernel]       [k] acpi_processor_ffh_cstate_enter
>>>        0.64%  [kernel]       [k] ip_forward
>>>        0.59%  [kernel]       [k] __netif_receive_skb_core
>>>        0.55%  [kernel]       [k] dev_hard_start_xmit
>>>        0.53%  [kernel]       [k] ip_route_input_rcu
>>>        0.53%  [kernel]       [k] ip_rcv_finish
>>>        0.51%  [kernel]       [k] page_frag_free
>>>        0.50%  [kernel]       [k] kmem_cache_alloc
>>>        0.50%  [kernel]       [k] udp_v4_early_demux
>>>        0.44%  [kernel]       [k] skb_release_data
>>>        0.42%  [kernel]       [k] inet_gro_receive
>>>        0.40%  [kernel]       [k] sch_direct_xmit
>>>        0.39%  [kernel]       [k] __local_bh_enable_ip
>>>        0.33%  [kernel]       [k] netdev_pick_tx
>>>        0.33%  [kernel]       [k] validate_xmit_skb
>>>        0.28%  [kernel]       [k] fib_validate_source
>>>        0.27%  [kernel]       [k] deliver_ptype_list_skb
>>>        0.25%  [kernel]       [k] eth_header
>>>        0.23%  [kernel]       [k] get_dma_ops
>>>        0.22%  [kernel]       [k] skb_network_protocol
>>>        0.21%  [kernel]       [k] ip_output
>>>        0.21%  [kernel]       [k] vlan_dev_hard_start_xmit
>>>        0.20%  [kernel]       [k] ixgbe_alloc_rx_buffers
>>>        0.18%  [kernel]       [k] nf_hook_slow
>>>        0.18%  [kernel]       [k] apic_timer_interrupt
>>>        0.18%  [kernel]       [k] virt_to_head_page
>>>        0.18%  [kernel]       [k] build_skb
>>>        0.16%  [kernel]       [k] swiotlb_map_page
>>>        0.16%  [kernel]       [k] ip_finish_output
>>>        0.16%  [kernel]       [k] udp4_gro_receive
>>>
>>>
>>> RESULTS:
>>>
>>> CSV format - delimeter ";"
>>>
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;1;64;1470912;88247040;1470720;85305530
>>> 1;1;64;1470912;88285440;1470977;85335110
>>> 2;1;64;1470464;88247040;1470402;85290508
>>> 3;1;64;1471424;88262400;1471230;85353728
>>> 4;1;64;1468736;88166400;1468672;85201652
>>> 5;1;64;1470016;88181760;1469949;85234944
>>> 6;1;64;1470720;88247040;1470466;85290624
>>> 7;1;64;1471232;88277760;1471167;85346246
>>> 8;1;64;1469184;88170240;1469249;85216326
>>> 9;1;64;1470592;88227840;1470847;85294394
>> Single core 1.47Mpps seems a little low, I would expect 2Mpps.
>>
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;2;64;2413120;144802560;2413245;139975924
>>> 1;2;64;2415296;144913920;2415356;140098188
>>> 2;2;64;2416768;144898560;2416573;140105670
>>> 3;2;64;2418176;145056000;2418110;140261806
>>> 4;2;64;2416512;144990720;2416509;140172950
>>> 5;2;64;2415168;144860160;2414466;140064780
>>> 6;2;64;2416960;144983040;2416833;140190930
>>> 7;2;64;2413632;144768000;2413568;140001734
>>> 8;2;64;2415296;144898560;2414589;140087168
>>> 9;2;64;2416576;144963840;2416892;140190930
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;3;64;3419008;205155840;3418882;198239244
>>> 1;3;64;3428032;205585920;3427971;198744234
>>> 2;3;64;3425472;205536000;3425344;198677260
>>> 3;3;64;3425088;205470720;3425156;198603136
>>> 4;3;64;3427648;205693440;3426883;198773888
>>> 5;3;64;3426880;205670400;3427392;198796044
>>> 6;3;64;3429120;205678080;3430140;198848186
>>> 7;3;64;3422976;205355520;3423490;198458136
>>> 8;3;64;3423168;205336320;3423486;198495372
>>> 9;3;64;3424384;205493760;3425538;198617868
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;4;64;4406464;264364800;4405244;255560296
>>> 1;4;64;4404672;264349440;4405122;255541504
>>> 2;4;64;4402368;264049920;4403326;255188864
>>> 3;4;64;4401344;264076800;4400702;255207134
>>> 4;4;64;4385536;263074560;4386620;254312716
>>> 5;4;64;4386560;263189760;4385404;254379532
>>> 6;4;64;4398784;263857920;4399031;255025288
>>> 7;4;64;4407232;264445440;4407998;255637900
>>> 8;4;64;4413184;264698880;4413758;255875816
>>> 9;4;64;4411328;264526080;4411906;255712372
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;5;64;5094464;305871360;5094464;295657262
>>> 1;5;64;5090816;305514240;5091201;295274810
>>> 2;5;64;5088384;305387520;5089792;295175108
>>> 3;5;64;5079296;304869120;5079484;294680368
>>> 4;5;64;5092992;305544960;5094207;295349166
>>> 5;5;64;5092416;305502720;5093372;295334260
>>> 6;5;64;5080896;304896000;5081090;294677004
>>> 7;5;64;5085376;305114880;5086401;294933058
>>> 8;5;64;5092544;305575680;5092036;295356938
>>> 9;5;64;5093056;305652480;5093832;295449506
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;6;64;5705088;342351360;5705784;330965110
>>> 1;6;64;5710272;342743040;5707591;331373952
>>> 2;6;64;5703424;342182400;5701826;330776552
>>> 3;6;64;5708736;342604800;5707963;331147462
>>> 4;6;64;5710144;342654720;5712067;331202910
>>> 5;6;64;5712064;342777600;5711361;331292288
>>> 6;6;64;5710144;342585600;5708607;331144272
>>> 7;6;64;5699840;342021120;5697853;330609222
>>> 8;6;64;5701184;342124800;5702909;330653592
>>> 9;6;64;5711360;342735360;5713283;331247686
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;7;64;6244416;374603520;6243591;362180072
>>> 1;7;64;6230912;374016000;6231490;361534126
>>> 2;7;64;6244800;374776320;6244866;362224326
>>> 3;7;64;6238720;374376960;6238261;361838510
>>> 4;7;64;6218816;373079040;6220413;360683962
>>> 5;7;64;6224320;373566720;6225086;361017404
>>> 6;7;64;6224000;373570560;6221370;360936088
>>> 7;7;64;6210048;372741120;6210627;360212654
>>> 8;7;64;6231616;374035200;6231537;361445502
>>> 9;7;64;6227840;373724160;6228802;361162752
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;8;64;6251840;375144960;6251849;362609678
>>> 1;8;64;6250816;375014400;6250881;362547038
>>> 2;8;64;6257728;375432960;6257160;362911104
>>> 3;8;64;6255552;375325440;6255622;362822074
>>> 4;8;64;6243776;374576640;6243270;362120622
>>> 5;8;64;6237184;374296320;6237690;361790080
>>> 6;8;64;6240960;374415360;6240714;361927366
>>> 7;8;64;6222784;373317120;6223746;360854424
>>> 8;8;64;6225920;373593600;6227014;361154980
>>> 9;8;64;6238528;374304000;6237701;361845238
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;14;64;6486144;389184000;6486135;376236488
>>> 1;14;64;6454912;387390720;6454222;374466734
>>> 2;14;64;6441152;386480640;6440431;373572780
>>> 3;14;64;6450240;386972160;6450870;374070014
>>> 4;14;64;6465600;387997440;6467221;375089654
>>> 5;14;64;6448384;386860800;6448000;373980230
>>> 6;14;64;6452352;387095040;6452148;374168904
>>> 7;14;64;6441984;386507520;6443203;373665058
>>> 8;14;64;6456704;387340800;6455744;374429092
>>> 9;14;64;6464640;387901440;6465218;374949004
>>> ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
>>> 0;16;64;6939008;416325120;6938696;402411192
>>> 1;16;64;6941952;416444160;6941745;402558918
>>> 2;16;64;6960576;417584640;6960707;403698718
>>> 3;16;64;6940736;416486400;6941820;402503876
>>> 4;16;64;6927680;415741440;6927420;401853870
>>> 5;16;64;6929792;415687680;6929917;401839196
>>> 6;16;64;6950400;416989440;6950661;403026166
>>> 7;16;64;6953664;417216000;6953454;403260544
>>> 8;16;64;6948480;416851200;6948800;403023266
>>> 9;16;64;6924160;415422720;6924092;401542468
>> I've seen Linux scale beyond 6.9Mpps, thus I also see this as an
>> issue/bug.  You could be stalling on DMA TX completion being too slow,
>> but you already increased the interval and increased the TX ring queue
>> size.  You could play with those setting and see if it changes this?
>>
>> Could you try my napi_monitor tool in:
>> https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/samples/bpf
>>
>> Also provide the output from:
>>   mpstat -P ALL -u -I SCPU -I SUM 2
> with 16 cores / 16 RSS queues
> Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft 
> %steal  %guest  %gnice   %idle
> Average:     all    0.00    0.00    0.01    0.00    0.00   28.57 
> 0.00    0.00    0.00   71.42
> Average:       0    0.00    0.00    0.04    0.00    0.00    0.08 
> 0.00    0.00    0.00   99.88
> Average:       1    0.00    0.00    0.12    0.00    0.00    0.00 
> 0.00    0.00    0.00   99.88
> Average:       2    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       3    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       4    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       5    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       6    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       7    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       8    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:       9    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      10    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      11    0.08    0.00    0.04    0.00    0.00    0.00 
> 0.00    0.00    0.00   99.88
> Average:      12    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      13    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      14    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      15    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      16    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      17    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      18    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      19    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      20    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      21    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      22    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      23    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      24    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      25    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      26    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      27    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      28    0.00    0.00    0.04    0.00    0.00    0.00 
> 0.00    0.00    0.00   99.96
> Average:      29    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      30    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      31    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      32    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      33    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      34    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      35    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      36    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      37    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      38    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      39    0.04    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00   99.96
> Average:      40    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      41    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      42    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      43    0.00    0.00    0.00    0.00    0.00  100.00 
> 0.00    0.00    0.00    0.00
> Average:      44    0.00    0.00    0.04    0.17    0.00    0.00 
> 0.00    0.00    0.00   99.79
> Average:      45    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      46    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      47    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      48    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      49    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      50    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      51    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      52    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      53    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      54    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
> Average:      55    0.00    0.00    0.00    0.00    0.00    0.00 
> 0.00    0.00    0.00  100.00
>
> Average:     CPU    intr/s
> Average:     all 123596.08
> Average:       0    646.38
> Average:       1    500.54
> Average:       2    511.67
> Average:       3    534.25
> Average:       4    542.21
> Average:       5    531.54
> Average:       6    554.58
> Average:       7    535.88
> Average:       8    544.58
> Average:       9    536.42
> Average:      10    575.46
> Average:      11    601.12
> Average:      12    502.08
> Average:      13    575.46
> Average:      14   5917.92
> Average:      15   5949.58
> Average:      16   7021.29
> Average:      17   7299.71
> Average:      18   7391.67
> Average:      19   7354.25
> Average:      20   7543.42
> Average:      21   7354.25
> Average:      22   7322.33
> Average:      23   7368.71
> Average:      24   7429.00
> Average:      25   7406.46
> Average:      26   7400.67
> Average:      27   7447.21
> Average:      28    517.00
> Average:      29    549.54
> Average:      30    529.33
> Average:      31    533.83
> Average:      32    541.25
> Average:      33    541.17
> Average:      34    532.50
> Average:      35    545.17
> Average:      36    528.96
> Average:      37    509.92
> Average:      38    520.12
> Average:      39    523.29
> Average:      40    530.75
> Average:      41    542.33
> Average:      42   5921.71
> Average:      43   5949.42
> Average:      44    503.04
> Average:      45    542.75
> Average:      46    582.50
> Average:      47    581.71
> Average:      48    495.29
> Average:      49    524.38
> Average:      50    527.92
> Average:      51    528.12
> Average:      52    456.38
> Average:      53    477.00
> Average:      54    440.92
> Average:      55    568.83
>
> Average:     CPU       HI/s    TIMER/s   NET_TX/s   NET_RX/s BLOCK/s 
> IRQ_POLL/s  TASKLET/s    SCHED/s  HRTIMER/s      RCU/s
> Average:       0       0.00     250.00       0.17      87.00 
> 0.00       0.00      45.46     250.00       0.00      13.75
> Average:       1       0.00     233.42       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      17.21
> Average:       2       0.00     249.04       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      12.67
> Average:       3       0.00     249.92       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      34.42
> Average:       4       0.00     248.67       0.17       0.00 
> 0.00       0.00       0.00     249.96       0.00      43.42
> Average:       5       0.00     249.46       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      32.17
> Average:       6       0.00     249.79       0.00       0.00 
> 0.00       0.00       0.00     249.87       0.00      54.92
> Average:       7       0.00     240.12       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      45.79
> Average:       8       0.00     247.42       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      47.25
> Average:       9       0.00     249.29       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      37.17
> Average:      10       0.00     248.75       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      76.79
> Average:      11       0.00     249.29       0.00       0.00 
> 0.00       0.00      42.79     249.83       0.00      59.21
> Average:      12       0.00     249.83       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00       2.29
> Average:      13       0.00     249.92       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      75.62
> Average:      14       0.00     148.21       0.17    5758.04 
> 0.00       0.00       0.00       8.42       0.00       3.08
> Average:      15       0.00     148.42       0.46    5789.25 
> 0.00       0.00       0.00       8.33       0.00       3.12
> Average:      16       0.00     142.62       0.79    6866.46 
> 0.00       0.00       0.00       8.29       0.00       3.12
> Average:      17       0.00     143.17       0.42    7145.00 
> 0.00       0.00       0.00       8.08       0.00       3.04
> Average:      18       0.00     153.62       0.42    7226.42 
> 0.00       0.00       0.00       8.04       0.00       3.17
> Average:      19       0.00     150.46       0.46    7192.21 
> 0.00       0.00       0.00       8.04       0.00       3.08
> Average:      20       0.00     145.21       0.17    7386.50 
> 0.00       0.00       0.00       8.29       0.00       3.25
> Average:      21       0.00     150.96       0.46    7191.37 
> 0.00       0.00       0.00       8.25       0.00       3.21
> Average:      22       0.00     146.67       0.54    7163.96 
> 0.00       0.00       0.00       8.04       0.00       3.12
> Average:      23       0.00     151.38       0.42    7205.75 
> 0.00       0.00       0.00       8.00       0.00       3.17
> Average:      24       0.00     153.33       0.17    7264.12 
> 0.00       0.00       0.00       8.08       0.00       3.29
> Average:      25       0.00     153.21       0.17    7241.83 
> 0.00       0.00       0.00       7.96       0.00       3.29
> Average:      26       0.00     153.96       0.17    7234.88 
> 0.00       0.00       0.00       8.38       0.00       3.29
> Average:      27       0.00     151.71       0.79    7283.25 
> 0.00       0.00       0.00       8.04       0.00       3.42
> Average:      28       0.00     245.71       0.00       0.00 
> 0.00       0.00       0.00     249.50       0.00      21.79
> Average:      29       0.00     233.21       0.00       0.00 
> 0.00       0.00       0.00     249.87       0.00      66.46
> Average:      30       0.00     248.92       0.00       0.00 
> 0.00       0.00       0.00     250.00       0.00      30.42
> Average:      31       0.00     249.92       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      33.96
> Average:      32       0.00     248.67       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      42.62
> Average:      33       0.00     249.46       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      41.79
> Average:      34       0.00     249.79       0.00       0.00 
> 0.00       0.00       0.00     249.87       0.00      32.83
> Average:      35       0.00     240.12       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      55.08
> Average:      36       0.00     247.42       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      31.58
> Average:      37       0.00     249.29       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      10.71
> Average:      38       0.00     248.75       0.00       0.00 
> 0.00       0.00       0.00     249.87       0.00      21.50
> Average:      39       0.00     249.50       0.00       0.00 
> 0.00       0.00       0.00     249.83       0.00      23.96
> Average:      40       0.00     249.83       0.00       0.00 
> 0.00       0.00       0.00     249.96       0.00      30.96
> Average:      41       0.00     249.92       0.00       0.00 
> 0.00       0.00       0.00     249.92       0.00      42.50
> Average:      42       0.00     148.38       0.71    5761.00 
> 0.00       0.00       0.00       8.25       0.00       3.38
> Average:      43       0.00     147.21       0.50    5790.33 
> 0.00       0.00       0.00       8.00       0.00       3.38
> Average:      44       0.00     248.96       0.00       0.00 
> 0.00       0.00       0.00     248.13       0.00       5.96
> Average:      45       0.00     249.04       0.00       0.00 
> 0.00       0.00       0.00     248.88       0.00      44.83
> Average:      46       0.00     248.96       0.00       0.00 
> 0.00       0.00       0.00     248.58       0.00      84.96
> Average:      47       0.00     249.00       0.00       0.00 
> 0.00       0.00       0.00     248.75       0.00      83.96
> Average:      48       0.00     249.12       0.00       0.00 
> 0.00       0.00       0.00     132.83       0.00     113.33
> Average:      49       0.00     249.12       0.00       0.00 
> 0.00       0.00       0.00     248.62       0.00      26.62
> Average:      50       0.00     248.92       0.00       0.00 
> 0.00       0.00       0.00     248.58       0.00      30.42
> Average:      51       0.00     249.08       0.00       0.00 
> 0.00       0.00       0.00     248.42       0.00      30.63
> Average:      52       0.00     249.21       0.00       0.00 
> 0.00       0.00       0.00     131.96       0.00      75.21
> Average:      53       0.00     249.08       0.00       0.00 
> 0.00       0.00       0.00     136.12       0.00      91.79
> Average:      54       0.00     249.00       0.00       0.00 
> 0.00       0.00       0.00     136.79       0.00      55.12
> Average:      55       0.00     249.04       0.00       0.00 
> 0.00       0.00       0.00     248.71       0.00      71.08
>
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ