[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJUBujG2AOBYsr0V7qyC5WTgzx0GucO=2ES69tTDJRziw@mail.gmail.com>
Date: Mon, 16 Oct 2023 12:35:40 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Stefan Wahren <wahrenst@....net>
Cc: Neal Cardwell <ncardwell@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Fabio Estevam <festevam@...il.com>, linux-imx@....com,
Stefan Wahren <stefan.wahren@...rgebyte.com>, Michael Heimpold <mhei@...mpold.de>, netdev@...r.kernel.org,
Yuchung Cheng <ycheng@...gle.com>
Subject: Re: iperf performance regression since Linux 5.18
On Mon, Oct 16, 2023 at 11:49 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sun, Oct 15, 2023 at 12:23 PM Stefan Wahren <wahrenst@....net> wrote:
> >
> > Hi,
> >
> > Am 15.10.23 um 01:26 schrieb Stefan Wahren:
> > > Hi Eric,
> > >
> > > Am 15.10.23 um 00:51 schrieb Eric Dumazet:
> > >> On Sat, Oct 14, 2023 at 9:40 PM Neal Cardwell <ncardwell@...gle.com>
> > >> wrote:
> > ...
> > >> Hmm, we receive ~3200 acks per second, I am not sure the
> > >> tcp_tso_should_defer() logic
> > >> would hurt ?
> > >>
> > >> Also the ss binary on the client seems very old, or its output has
> > >> been mangled perhaps ?
> > > this binary is from Yocto kirkstone:
> > >
> > > # ss --version
> > > ss utility, iproute2-5.17.0
> > >
> > > This shouldn't be too old. Maybe some missing kernel settings?
> > >
> > i think i was able to fix the issue by enable the proper kernel
> > settings. I rerun initial bad and good case again and overwrote the log
> > files:
> >
> > https://github.com/lategoodbye/tcp_tso_rtt_log_regress/commit/93615c94ba1bf36bd47cc2b91dd44a3f58c601bc
>
> Excellent, thanks.
>
> I see your kernel uses HZ=100, have you tried HZ=1000 by any chance ?
>
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
>
> I see that the bad run seems to be stuck for a while with cwnd=66, but
> a smaller amount of packets in flight (26 in following ss extract)
>
> ESTAB 0 315664 192.168.1.12:60542 192.168.1.129:5001
> timer:(on,030ms,0) ino:13011 sk:2 <->
> skmem:(r0,rb131072,t48488,tb295680,f3696,w319888,o0,bl0,d0) ts sack
> cubic wscale:7,6 rto:210 rtt:3.418/1.117 mss:1448 pmtu:1500 rcvmss:536
> advmss:1448 cwnd:66 ssthresh:20 bytes_sent:43874400
> bytes_acked:43836753 segs_out:30302 segs_in:14110 data_segs_out:30300
> send 223681685bps lastsnd:10 lastrcv:4310 pacing_rate 268408200bps
> delivery_rate 46336000bps delivered:30275 busy:4310ms unacked:26
> rcv_space:14480 rcv_ssthresh:64088 notsent:278016 minrtt:0.744
>
> I wonder if fec pseudo-tso code is adding some kind of artifacts,
> maybe with TCP small queue logic.
> (TX completion might be delayed too much on fec driver side)
Speaking of TSQ, it seems an old change (commit 75eefc6c59fd "tcp:
tsq: add a shortcut in tcp_small_queue_check()")
has been accidentally removed in 2017 (75c119afe14f "tcp: implement
rb-tree based retransmit queue")
Could you try this fix:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9c8c42c280b7638f0f4d94d68cd2c73e3c6c2bcc..e61a3a381d51b554ec8440928e22a290712f0b6b
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2542,6 +2542,18 @@ static bool tcp_pacing_check(struct sock *sk)
return true;
}
+static bool tcp_rtx_queue_empty_or_single_skb(const struct sock *sk)
+{
+ const struct rb_node *node = sk->tcp_rtx_queue.rb_node;
+
+ /* No skb in the rtx queue. */
+ if (!node)
+ return true;
+
+ /* Only one skb in rtx queue. */
+ return !node->rb_left && !node->rb_right;
+}
+
/* TCP Small Queues :
* Control number of packets in qdisc/devices to two packets / or ~1 ms.
* (These limits are doubled for retransmits)
@@ -2579,12 +2591,12 @@ static bool tcp_small_queue_check(struct sock
*sk, const struct sk_buff *skb,
limit += extra_bytes;
}
if (refcount_read(&sk->sk_wmem_alloc) > limit) {
- /* Always send skb if rtx queue is empty.
+ /* Always send skb if rtx queue is empty or has one skb.
* No need to wait for TX completion to call us back,
* after softirq/tasklet schedule.
* This helps when TX completions are delayed too much.
*/
- if (tcp_rtx_queue_empty(sk))
+ if (tcp_rtx_queue_empty_or_single_skb(sk))
return false;
set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
Powered by blists - more mailing lists