lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 2 Dec 2019 17:23:27 +0100 From: Paweł Staszewski <pstaszewski@...are.pl> To: Paolo Abeni <pabeni@...hat.com>, David Ahern <dsahern@...il.com>, netdev@...r.kernel.org, Jesper Dangaard Brouer <brouer@...hat.com> Subject: Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance W dniu 02.12.2019 o 11:53, Paolo Abeni pisze: > On Mon, 2019-12-02 at 11:09 +0100, Paweł Staszewski wrote: >> W dniu 01.12.2019 o 17:05, David Ahern pisze: >>> On 11/29/19 4:00 PM, Paweł Staszewski wrote: >>>> As always - each year i need to summarize network performance for >>>> routing applications like linux router on native Linux kernel (without >>>> xdp/dpdk/vpp etc) :) >>>> >>> Do you keep past profiles? How does this profile (and traffic rates) >>> compare to older kernels - e.g., 5.0 or 4.19? >>> >>> >> Yes - so for 4.19: >> >> Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of >> forwarded(routed) traffic >> >> And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX >> (forwarding - bandwidth max) >> >> (current kernel almost doubled this) > Looks like we are on the good track ;) > > [...] >> After "order-0 pages" patch >> >> PerfTop: 104692 irqs/sec kernel:99.5% exact: 0.0% [4000Hz >> cycles], (all, 56 CPUs) >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> >> 9.06% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear >> 6.43% [kernel] [k] tasklet_action_common.isra.21 >> 5.68% [kernel] [k] fib_table_lookup >> 4.89% [kernel] [k] irq_entries_start >> 4.53% [kernel] [k] mlx5_eq_int >> 4.10% [kernel] [k] build_skb >> 3.39% [kernel] [k] mlx5e_poll_tx_cq >> 3.38% [kernel] [k] mlx5e_sq_xmit >> 2.73% [kernel] [k] mlx5e_poll_rx_cq > Compared to the current kernel perf figures, it looks like most of the > gains come from driver changes. > > [... current perf figures follow ...] >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> >> 7.56% [kernel] [k] __dev_queue_xmit > This is a bit surprising to me. I guess this is due > '__dev_queue_xmit()' being calling twice per packet (team, NIC) and due > to the retpoline overhead. > >> 1.74% [kernel] [k] tcp_gro_receive > If the reference use-case is with a quite large number of cuncurrent > flows, I guess you can try disabling GRO Disabling GRO with teamed interfaces is not good cause after disabling GRO on physical interfaces cpu load is about 10% higher on all cores. And observation: Enabled GRO on interfaces vs team0 packets per second: iface Rx Tx Total ============================================================================== team0: 5952483.50 KB/s 6028436.50 KB/s 11980919.00 KB/s ---------------------------------------------------------------------------- And softnetstats: CPU total/sec dropped/sec squeezed/sec collision/sec rx_rps/sec flow_limit/sec CPU:00 1014977 0 35 0 0 0 CPU:01 1074461 0 30 0 0 0 CPU:02 1020460 0 34 0 0 0 CPU:03 1077624 0 34 0 0 0 CPU:04 1005102 0 32 0 0 0 CPU:05 1097107 0 46 0 0 0 CPU:06 997877 0 24 0 0 0 CPU:07 1056216 0 34 0 0 0 CPU:08 856567 0 34 0 0 0 CPU:09 862527 0 23 0 0 0 CPU:10 876107 0 34 0 0 0 CPU:11 759275 0 27 0 0 0 CPU:12 817307 0 27 0 0 0 CPU:13 868073 0 21 0 0 0 CPU:14 837783 0 34 0 0 0 CPU:15 817946 0 27 0 0 0 CPU:16 785500 0 25 0 0 0 CPU:17 851276 0 28 0 0 0 CPU:18 843888 0 29 0 0 0 CPU:19 924840 0 34 0 0 0 CPU:20 884879 0 37 0 0 0 CPU:21 841461 0 28 0 0 0 CPU:22 819436 0 32 0 0 0 CPU:23 872843 0 32 0 0 0 Summed: 21863531 0 740 0 0 0 Disabled GRO on interfaces vs team0 packets per second: iface Rx Tx Total ============================================================================== team0: 5952483.50 KB/s 6028436.50 KB/s 11980919.00 KB/s ---------------------------------------------------------------------------- And softnet stat: CPU total/sec dropped/sec squeezed/sec collision/sec rx_rps/sec flow_limit/sec CPU:00 625288 0 23 0 0 0 CPU:01 605239 0 24 0 0 0 CPU:02 644965 0 26 0 0 0 CPU:03 620264 0 30 0 0 0 CPU:04 603416 0 25 0 0 0 CPU:05 597838 0 23 0 0 0 CPU:06 580028 0 22 0 0 0 CPU:07 604274 0 23 0 0 0 CPU:08 556119 0 26 0 0 0 CPU:09 494997 0 23 0 0 0 CPU:10 514759 0 23 0 0 0 CPU:11 500333 0 22 0 0 0 CPU:12 497956 0 23 0 0 0 CPU:13 535194 0 14 0 0 0 CPU:14 504304 0 24 0 0 0 CPU:15 489015 0 18 0 0 0 CPU:16 487249 0 24 0 0 0 CPU:17 472023 0 23 0 0 0 CPU:18 539454 0 24 0 0 0 CPU:19 499901 0 19 0 0 0 CPU:20 479945 0 26 0 0 0 CPU:21 486800 0 29 0 0 0 CPU:22 466916 0 26 0 0 0 CPU:23 559730 0 34 0 0 0 Summed: 12966008 0 573 0 0 0 Maybee without team it will be better. > > Cheers, > > Paolo > -- Paweł Staszewski
Powered by blists - more mailing lists