[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <38f3e518-f41d-8e03-a6b9-7337566b8e04@itcare.pl>
Date: Mon, 2 Dec 2019 17:23:27 +0100
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Paolo Abeni <pabeni@...hat.com>, David Ahern <dsahern@...il.com>,
netdev@...r.kernel.org, Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network
performance
W dniu 02.12.2019 o 11:53, Paolo Abeni pisze:
> On Mon, 2019-12-02 at 11:09 +0100, Paweł Staszewski wrote:
>> W dniu 01.12.2019 o 17:05, David Ahern pisze:
>>> On 11/29/19 4:00 PM, Paweł Staszewski wrote:
>>>> As always - each year i need to summarize network performance for
>>>> routing applications like linux router on native Linux kernel (without
>>>> xdp/dpdk/vpp etc) :)
>>>>
>>> Do you keep past profiles? How does this profile (and traffic rates)
>>> compare to older kernels - e.g., 5.0 or 4.19?
>>>
>>>
>> Yes - so for 4.19:
>>
>> Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of
>> forwarded(routed) traffic
>>
>> And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX
>> (forwarding - bandwidth max)
>>
>> (current kernel almost doubled this)
> Looks like we are on the good track ;)
>
> [...]
>> After "order-0 pages" patch
>>
>> PerfTop: 104692 irqs/sec kernel:99.5% exact: 0.0% [4000Hz
>> cycles], (all, 56 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> 9.06% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
>> 6.43% [kernel] [k] tasklet_action_common.isra.21
>> 5.68% [kernel] [k] fib_table_lookup
>> 4.89% [kernel] [k] irq_entries_start
>> 4.53% [kernel] [k] mlx5_eq_int
>> 4.10% [kernel] [k] build_skb
>> 3.39% [kernel] [k] mlx5e_poll_tx_cq
>> 3.38% [kernel] [k] mlx5e_sq_xmit
>> 2.73% [kernel] [k] mlx5e_poll_rx_cq
> Compared to the current kernel perf figures, it looks like most of the
> gains come from driver changes.
>
> [... current perf figures follow ...]
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> 7.56% [kernel] [k] __dev_queue_xmit
> This is a bit surprising to me. I guess this is due
> '__dev_queue_xmit()' being calling twice per packet (team, NIC) and due
> to the retpoline overhead.
>
>> 1.74% [kernel] [k] tcp_gro_receive
> If the reference use-case is with a quite large number of cuncurrent
> flows, I guess you can try disabling GRO
Disabling GRO with teamed interfaces is not good cause after disabling
GRO on physical interfaces cpu load is about 10% higher on all cores.
And observation:
Enabled GRO on interfaces vs team0 packets per second:
iface Rx Tx Total
==============================================================================
team0: 5952483.50 KB/s 6028436.50 KB/s 11980919.00
KB/s
----------------------------------------------------------------------------
And softnetstats:
CPU total/sec dropped/sec squeezed/sec
collision/sec rx_rps/sec flow_limit/sec
CPU:00 1014977 0 35 0
0 0
CPU:01 1074461 0 30 0
0 0
CPU:02 1020460 0 34 0
0 0
CPU:03 1077624 0 34 0
0 0
CPU:04 1005102 0 32 0
0 0
CPU:05 1097107 0 46 0
0 0
CPU:06 997877 0 24 0
0 0
CPU:07 1056216 0 34 0
0 0
CPU:08 856567 0 34 0
0 0
CPU:09 862527 0 23 0
0 0
CPU:10 876107 0 34 0
0 0
CPU:11 759275 0 27 0
0 0
CPU:12 817307 0 27 0
0 0
CPU:13 868073 0 21 0
0 0
CPU:14 837783 0 34 0
0 0
CPU:15 817946 0 27 0
0 0
CPU:16 785500 0 25 0
0 0
CPU:17 851276 0 28 0
0 0
CPU:18 843888 0 29 0
0 0
CPU:19 924840 0 34 0
0 0
CPU:20 884879 0 37 0
0 0
CPU:21 841461 0 28 0
0 0
CPU:22 819436 0 32 0
0 0
CPU:23 872843 0 32 0
0 0
Summed: 21863531 0 740 0
0 0
Disabled GRO on interfaces vs team0 packets per second:
iface Rx Tx Total
==============================================================================
team0: 5952483.50 KB/s 6028436.50 KB/s 11980919.00
KB/s
----------------------------------------------------------------------------
And softnet stat:
CPU total/sec dropped/sec squeezed/sec
collision/sec rx_rps/sec flow_limit/sec
CPU:00 625288 0 23 0
0 0
CPU:01 605239 0 24 0
0 0
CPU:02 644965 0 26 0
0 0
CPU:03 620264 0 30 0
0 0
CPU:04 603416 0 25 0
0 0
CPU:05 597838 0 23 0
0 0
CPU:06 580028 0 22 0
0 0
CPU:07 604274 0 23 0
0 0
CPU:08 556119 0 26 0
0 0
CPU:09 494997 0 23 0
0 0
CPU:10 514759 0 23 0
0 0
CPU:11 500333 0 22 0
0 0
CPU:12 497956 0 23 0
0 0
CPU:13 535194 0 14 0
0 0
CPU:14 504304 0 24 0
0 0
CPU:15 489015 0 18 0
0 0
CPU:16 487249 0 24 0
0 0
CPU:17 472023 0 23 0
0 0
CPU:18 539454 0 24 0
0 0
CPU:19 499901 0 19 0
0 0
CPU:20 479945 0 26 0
0 0
CPU:21 486800 0 29 0
0 0
CPU:22 466916 0 26 0
0 0
CPU:23 559730 0 34 0
0 0
Summed: 12966008 0 573 0
0 0
Maybee without team it will be better.
>
> Cheers,
>
> Paolo
>
--
Paweł Staszewski
Powered by blists - more mailing lists