[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211129181705.318199709@linuxfoundation.org>
Date: Mon, 29 Nov 2021 19:18:33 +0100
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-kernel@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable@...r.kernel.org, Neal Cardwell <ncardwell@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Yuchung Cheng <ycheng@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Sasha Levin <sashal@...nel.org>
Subject: [PATCH 4.19 51/69] tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
From: Eric Dumazet <edumazet@...gle.com>
[ Upstream commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001 ]
While testing BIG TCP patch series, I was expecting that TCP_RR workloads
with 80KB requests/answers would send one 80KB TSO packet,
then being received as a single GRO packet.
It turns out this was not happening, and the root cause was that
cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC.
Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC
needed a budget of ~20 segments.
Ideally these TCP_RR flows should not exit slow start.
Cubic Hystart should reset itself at each round, instead of assuming
every TCP flow is a bulk one.
Note that even after this patch, Hystart can still trigger, depending
on scheduling artifacts, but at a higher CWND/SSTHRESH threshold,
keeping optimal TSO packet sizes.
Tested:
ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072
nstat -n; netperf -H ... -t TCP_RR -l 5 -- -r 80000,80000 -K cubic; nstat|egrep "Ip6InReceives|Hystart|Ip6OutRequests"
Before:
8605
Ip6InReceives 87541 0.0
Ip6OutRequests 129496 0.0
TcpExtTCPHystartTrainDetect 1 0.0
TcpExtTCPHystartTrainCwnd 30 0.0
After:
8760
Ip6InReceives 88514 0.0
Ip6OutRequests 87975 0.0
Fixes: ae27e98a5152 ("[TCP] CUBIC v2.3")
Co-developed-by: Neal Cardwell <ncardwell@...gle.com>
Signed-off-by: Neal Cardwell <ncardwell@...gle.com>
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Cc: Stephen Hemminger <stephen@...workplumber.org>
Cc: Yuchung Cheng <ycheng@...gle.com>
Cc: Soheil Hassas Yeganeh <soheil@...gle.com>
Link: https://lore.kernel.org/r/20211123202535.1843771-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
net/ipv4/tcp_cubic.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 8b5ba0a5cd386..93530bd332470 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -340,8 +340,6 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
return;
if (tcp_in_slow_start(tp)) {
- if (hystart && after(ack, ca->end_seq))
- bictcp_hystart_reset(sk);
acked = tcp_slow_start(tp, acked);
if (!acked)
return;
@@ -383,6 +381,9 @@ static void hystart_update(struct sock *sk, u32 delay)
if (ca->found & hystart_detect)
return;
+ if (after(tp->snd_una, ca->end_seq))
+ bictcp_hystart_reset(sk);
+
if (hystart_detect & HYSTART_ACK_TRAIN) {
u32 now = bictcp_clock();
--
2.33.0
Powered by blists - more mailing lists