netdev - Re: iperf performance regression since Linux 5.18

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+53WWxaZA5+cc9Yck8h+HTV6BvbybAnvTckriFfKpQMQ@mail.gmail.com>
Date: Mon, 16 Oct 2023 11:49:47 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Stefan Wahren <wahrenst@....net>
Cc: Neal Cardwell <ncardwell@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Fabio Estevam <festevam@...il.com>, linux-imx@....com, 
	Stefan Wahren <stefan.wahren@...rgebyte.com>, Michael Heimpold <mhei@...mpold.de>, netdev@...r.kernel.org, 
	Yuchung Cheng <ycheng@...gle.com>
Subject: Re: iperf performance regression since Linux 5.18

On Sun, Oct 15, 2023 at 12:23 PM Stefan Wahren <wahrenst@....net> wrote:
>
> Hi,
>
> Am 15.10.23 um 01:26 schrieb Stefan Wahren:
> > Hi Eric,
> >
> > Am 15.10.23 um 00:51 schrieb Eric Dumazet:
> >> On Sat, Oct 14, 2023 at 9:40 PM Neal Cardwell <ncardwell@...gle.com>
> >> wrote:
> ...
> >> Hmm, we receive ~3200 acks per second, I am not sure the
> >> tcp_tso_should_defer() logic
> >> would hurt ?
> >>
> >> Also the ss binary on the client seems very old, or its output has
> >> been mangled perhaps ?
> > this binary is from Yocto kirkstone:
> >
> > # ss --version
> > ss utility, iproute2-5.17.0
> >
> > This shouldn't be too old. Maybe some missing kernel settings?
> >
> i think i was able to fix the issue by enable the proper kernel
> settings. I rerun initial bad and good case again and overwrote the log
> files:
>
> https://github.com/lategoodbye/tcp_tso_rtt_log_regress/commit/93615c94ba1bf36bd47cc2b91dd44a3f58c601bc

Excellent, thanks.

I see your kernel uses HZ=100, have you tried HZ=1000 by any chance ?

CONFIG_HZ_1000=y
CONFIG_HZ=1000

I see that the bad run seems to be stuck for a while with cwnd=66, but
a smaller amount of packets in flight (26 in following ss extract)

ESTAB 0 315664 192.168.1.12:60542 192.168.1.129:5001
timer:(on,030ms,0) ino:13011 sk:2 <->
skmem:(r0,rb131072,t48488,tb295680,f3696,w319888,o0,bl0,d0) ts sack
cubic wscale:7,6 rto:210 rtt:3.418/1.117 mss:1448 pmtu:1500 rcvmss:536
advmss:1448 cwnd:66 ssthresh:20 bytes_sent:43874400
bytes_acked:43836753 segs_out:30302 segs_in:14110 data_segs_out:30300
send 223681685bps lastsnd:10 lastrcv:4310 pacing_rate 268408200bps
delivery_rate 46336000bps delivered:30275 busy:4310ms unacked:26
rcv_space:14480 rcv_ssthresh:64088 notsent:278016 minrtt:0.744

I wonder if fec pseudo-tso code is adding some kind of artifacts,
maybe with TCP small queue logic.
(TX completion might be delayed too much on fec driver side)

Can you try

ethtool -K eth0 tso off ?

Alternatively I think I mentioned earlier that you could try to reduce
gso_max_size on a 100Mbit link

ip link set dev eth0 gso_max_size 16384

(Presumably a driver could do this tuning automatically)