[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abcb6fde-4b08-73ff-02c9-01609a41a087@itcare.pl>
Date: Sat, 10 Nov 2018 23:04:22 +0100
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Saeed Mahameed <saeedm@...lanox.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
traffic
W dniu 10.11.2018 o 22:53, Paweł Staszewski pisze:
>
>
> W dniu 10.11.2018 o 22:01, Jesper Dangaard Brouer pisze:
>> On Sat, 10 Nov 2018 21:02:10 +0100
>> Paweł Staszewski <pstaszewski@...are.pl> wrote:
>>
>>> W dniu 10.11.2018 o 20:34, Jesper Dangaard Brouer pisze:
>>>> I want you to experiment with:
>>>>
>>>> ethtool --set-priv-flags DEVICE rx_striding_rq off
>>> just checked that previously connectx4 was have thos disabled:
>>> ethtool --show-priv-flags enp175s0f0
>>>
>>> Private flags for enp175s0f0:
>>> rx_cqe_moder : on
>>> tx_cqe_moder : off
>>> rx_cqe_compress : off
>>> rx_striding_rq : off
>>> rx_no_csum_complete: off
>>>
>> The CX4 hardware does not have this feature (p.s. the CX4-Lx does).
>>
>>> So now we are on connectx5 and we have enabled - for sure connectx5
>>> changed cpu load - where i have now max 50/60% cpu where with connectx4
>>> there was sometimes near 100% with same configuration.
>> I (strongly) believe the CPU load was related to the page-alloactor
>> lock congestion, that Aaron fixed.
>>
> Yes i think both - most problems with cpu was due to page-allocator
> problems.
> But also after change connctx4 to connectx5 there is cpu load
> difference - about 10% in total - but yes most of this like 40% is
> cause of Aaron patch :) - rly good job :)
>
>
> Now im messing with ring configuration for connectx5 nics.
> And after reading that paper:
> https://netdevconf.org/2.1/slides/apr6/network-performance/
> 04-amir-RX_and_TX_bulking_v2.pdf
>
> changed from RX:8192 / TX: 4096 to RX:8192 / TX: 256
>
> after this i gain about 5Gbit/s RX and TX traffic and less cpu load....
> before change there was 59/59 Gbit/s
>
> After change there is 64/64 Gbit/s
>
> bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
> input: /proc/net/dev type: rate
> | iface Rx Tx Total
> ==============================================================================
>
> enp175s0: 44.45 Gb/s 19.69 Gb/s
> 64.14 Gb/s
> enp216s0: 19.69 Gb/s 44.49 Gb/s
> 64.19 Gb/s
> ------------------------------------------------------------------------------
>
> total: 64.14 Gb/s 64.18 Gb/s 128.33 Gb/s
>
>
Also after this change kernel freed some memory... like 500MB
Still squeezed but less with more traffic...
CPU total/sec dropped/sec squeezed/sec
collision/sec rx_rps/sec flow_limit/sec
CPU:00 0 0 0 0
0 0
CPU:01 0 0 0 0
0 0
CPU:02 0 0 0 0
0 0
CPU:03 0 0 0 0
0 0
CPU:04 0 0 0 0
0 0
CPU:05 0 0 0 0
0 0
CPU:06 0 0 0 0
0 0
CPU:07 0 0 0 0
0 0
CPU:08 0 0 0 0
0 0
CPU:09 0 0 0 0
0 0
CPU:10 0 0 0 0
0 0
CPU:11 0 0 0 0
0 0
CPU:12 0 0 0 0
0 0
CPU:13 0 0 0 0
0 0
CPU:14 389270 0 41 0
0 0
CPU:15 375543 0 32 0
0 0
CPU:16 385847 0 22 0
0 0
CPU:17 412293 0 34 0
0 0
CPU:18 401287 0 30 0
0 0
CPU:19 368345 0 30 0
0 0
CPU:20 395452 0 28 0
0 0
CPU:21 374032 0 38 0
0 0
CPU:22 342036 0 32 0
0 0
CPU:23 374773 0 34 0
0 0
CPU:24 356139 0 31 0
0 0
CPU:25 392725 0 32 0
0 0
CPU:26 385937 0 37 0
0 0
CPU:27 385282 0 37 0
0 0
CPU:28 0 0 0 0
0 0
CPU:29 0 0 0 0
0 0
CPU:30 0 0 0 0
0 0
CPU:31 0 0 0 0
0 0
CPU:32 0 0 0 0
0 0
CPU:33 0 0 0 0
0 0
CPU:34 0 0 0 0
0 0
CPU:35 0 0 0 0
0 0
CPU:36 0 0 0 0
0 0
CPU:37 0 0 0 0
0 0
CPU:38 0 0 0 0
0 0
CPU:39 0 0 0 0
0 0
CPU:40 0 0 0 0
0 0
CPU:41 0 0 0 0
0 0
CPU:42 340817 0 33 0
0 0
CPU:43 364805 0 42 0
0 0
CPU:44 298484 0 29 0
0 0
CPU:45 292798 0 30 0
0 0
CPU:46 301739 0 24 0
0 0
CPU:47 275116 0 20 0
0 0
CPU:48 319237 0 34 0
0 0
CPU:49 290350 0 29 0
0 0
CPU:50 307084 0 30 0
0 0
CPU:51 332908 0 24 0
0 0
CPU:52 300151 0 24 0
0 0
CPU:53 310140 0 28 0
0 0
CPU:54 341788 0 28 0
0 0
CPU:55 320344 0 28 0
0 0
Summed: 9734722 0 860 0
0 0
>
>
>
>
>
>
>
>
>
>
Powered by blists - more mailing lists