[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKY58YSknzOzkEHxFu=C=1_p=pXGAHGo9ZkAfAGon9ayw@mail.gmail.com>
Date: Mon, 9 Oct 2023 21:10:55 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Stefan Wahren <wahrenst@....net>
Cc: Jakub Kicinski <kuba@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
Fabio Estevam <festevam@...il.com>, linux-imx@....com,
Stefan Wahren <stefan.wahren@...rgebyte.com>, Michael Heimpold <mhei@...mpold.de>, netdev@...r.kernel.org
Subject: Re: iperf performance regression since Linux 5.18
On Mon, Oct 9, 2023 at 8:58 PM Stefan Wahren <wahrenst@....net> wrote:
>
> Hi,
> we recently switched on our ARM NXP i.MX6ULL based embedded device
> (Tarragon Master [1]) from an older kernel version to Linux 6.1. After
> that we noticed a measurable performance regression on the Ethernet
> interface (driver: fec, 100 Mbit link) while running iperf client on the
> device:
>
> BAD
>
> # iperf -t 10 -i 1 -c 192.168.1.129
> ------------------------------------------------------------
> Client connecting to 192.168.1.129, TCP port 5001
> TCP window size: 96.2 KByte (default)
> ------------------------------------------------------------
> [ 3] local 192.168.1.12 port 56022 connected with 192.168.1.129 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 9.88 MBytes 82.8 Mbits/sec
> [ 3] 1.0- 2.0 sec 9.62 MBytes 80.7 Mbits/sec
> [ 3] 2.0- 3.0 sec 9.75 MBytes 81.8 Mbits/sec
> [ 3] 3.0- 4.0 sec 9.62 MBytes 80.7 Mbits/sec
> [ 3] 4.0- 5.0 sec 9.62 MBytes 80.7 Mbits/sec
> [ 3] 5.0- 6.0 sec 9.62 MBytes 80.7 Mbits/sec
> [ 3] 6.0- 7.0 sec 9.50 MBytes 79.7 Mbits/sec
> [ 3] 7.0- 8.0 sec 9.75 MBytes 81.8 Mbits/sec
> [ 3] 8.0- 9.0 sec 9.62 MBytes 80.7 Mbits/sec
> [ 3] 9.0-10.0 sec 9.50 MBytes 79.7 Mbits/sec
> [ 3] 0.0-10.0 sec 96.5 MBytes 80.9 Mbits/sec
>
> GOOD
>
> # iperf -t 10 -i 1 -c 192.168.1.129
> ------------------------------------------------------------
> Client connecting to 192.168.1.129, TCP port 5001
> TCP window size: 96.2 KByte (default)
> ------------------------------------------------------------
> [ 3] local 192.168.1.12 port 54898 connected with 192.168.1.129 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 11.2 MBytes 94.4 Mbits/sec
> [ 3] 1.0- 2.0 sec 11.0 MBytes 92.3 Mbits/sec
> [ 3] 2.0- 3.0 sec 10.8 MBytes 90.2 Mbits/sec
> [ 3] 3.0- 4.0 sec 11.0 MBytes 92.3 Mbits/sec
> [ 3] 4.0- 5.0 sec 10.9 MBytes 91.2 Mbits/sec
> [ 3] 5.0- 6.0 sec 10.9 MBytes 91.2 Mbits/sec
> [ 3] 6.0- 7.0 sec 10.8 MBytes 90.2 Mbits/sec
> [ 3] 7.0- 8.0 sec 10.9 MBytes 91.2 Mbits/sec
> [ 3] 8.0- 9.0 sec 10.9 MBytes 91.2 Mbits/sec
> [ 3] 9.0-10.0 sec 10.9 MBytes 91.2 Mbits/sec
> [ 3] 0.0-10.0 sec 109 MBytes 91.4 Mbits/sec
>
> We were able to bisect this down to this commit:
>
> first bad commit: [65466904b015f6eeb9225b51aeb29b01a1d4b59c] tcp: adjust
> TSO packet sizes based on min_rtt
>
> Disabling this new setting via:
>
> echo 0 > /proc/sys/net/ipv4/tcp_tso_rtt_log
>
> confirm that this was the cause of the performance regression.
>
> Is it expected that the new default setting has such a performance impact?
Thanks for the report
Normally no. I guess you need to give us more details.
qdisc in use, MTU in use, congestion control in use, "ss -temoi dst
192.168.1.129 " output from sender side while the flow is running.
Note that reaching line rate on a TCP flow is always tricky,
regardless of what 'line rate' is.
I suspect an issue on the receiving side with larger GRO packets perhaps ?
You could try to limit GRO or TSO packet sizes to determine if this is
a driver issue.
(ip link set dev ethX gro_max_size XXXXX gso_max_size YYYYY)
>
> More information of the platform ...
>
> # ethtool -k eth0
> Features for eth0:
> rx-checksumming: on
> tx-checksumming: on
> tx-checksum-ipv4: on
> tx-checksum-ip-generic: off [fixed]
> tx-checksum-ipv6: on
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: on
> tx-tcp-ecn-segmentation: off [fixed]
> tx-tcp-mangleid-segmentation: off
> tx-tcp6-segmentation: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on
> tx-vlan-offload: off [fixed]
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: off [fixed]
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-gre-csum-segmentation: off [fixed]
> tx-ipxip4-segmentation: off [fixed]
> tx-ipxip6-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> tx-udp_tnl-csum-segmentation: off [fixed]
> tx-gso-partial: off [fixed]
> tx-tunnel-remcsum-segmentation: off [fixed]
> tx-sctp-segmentation: off [fixed]
> tx-esp-segmentation: off [fixed]
> tx-udp-segmentation: off [fixed]
> tx-gso-list: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> hw-tc-offload: off [fixed]
> esp-hw-offload: off [fixed]
> esp-tx-csum-hw-offload: off [fixed]
> rx-udp_tunnel-port-offload: off [fixed]
> tls-hw-tx-offload: off [fixed]
> tls-hw-rx-offload: off [fixed]
> rx-gro-hw: off [fixed]
> tls-hw-record: off [fixed]
> rx-gro-list: off
> macsec-hw-offload: off [fixed]
> rx-udp-gro-forwarding: off
> hsr-tag-ins-offload: off [fixed]
> hsr-tag-rm-offload: off [fixed]
> hsr-fwd-offload: off [fixed]
> hsr-dup-offload: off [fixed]
>
> [1] -
> https://elixir.bootlin.com/linux/latest/source/arch/arm/boot/dts/nxp/imx/imx6ull-tarragon-master.dts
Powered by blists - more mailing lists