[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180627015222.3269067-1-brakmo@fb.com>
Date: Tue, 26 Jun 2018 18:52:22 -0700
From: Lawrence Brakmo <brakmo@...com>
To: netdev <netdev@...r.kernel.org>
CC: Kernel Team <kernel-team@...com>, Blake Matheny <bmatheny@...com>,
Alexei Starovoitov <ast@...com>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction
When using dctcp and doing RPCs, if the last packet of a request is
ECN marked as having seen congestion (CE), the sender can decrease its
cwnd to 1. As a result, it will only send one packet when a new request
is sent. In some instances this results in high tail latencies.
For example, in one setup there are 3 hosts sending to a 4th one, with
each sender having 3 flows (1 stream, 1 1MB back-to-back RPCs and 1 10KB
back-to-back RPCs). The following table shows the 99% and 99.9%
latencies for both Cubic and dctcp
Cubic 99% Cubic 99.9% dctcp 99% dctcp 99.9%
1MB RPCs 3.5ms 6.0ms 43ms 208ms
10KB RPCs 1.0ms 2.5ms 53ms 212ms
On 4.11, pcap traces indicate that in some instances the 1st packet of
the RPC is received but no ACK is sent before the packet is
retransmitted. On 4.11 netstat shows TCP timeouts, with some of them
spurious.
On 4.16, we don't see retransmits in netstat but the high tail latencies
are still there. Forcing cwnd to be at least 2 in tcp_cwnd_reduction
fixes the problem with the high tail latencies. The latencies now look
like this:
dctcp 99% dctcp 99.9%
1MB RPCs 3.8ms 4.4ms
10KB RPCs 168us 211us
Another group working with dctcp saw the same issue with production
traffic and it was solved with this patch.
The only issue is if it is safe to always use 2 or if it is better to
use min(2, snd_ssthresh) (which could still trigger the problem).
Signed-off-by: Lawrence Brakmo <brakmo@...com>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 76ca88f63b70..a9255c424761 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2477,7 +2477,7 @@ void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int flag)
}
/* Force a fast retransmit upon entering fast recovery */
sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1));
- tp->snd_cwnd = tcp_packets_in_flight(tp) + sndcnt;
+ tp->snd_cwnd = max(tcp_packets_in_flight(tp) + sndcnt, 2);
}
static inline void tcp_end_cwnd_reduction(struct sock *sk)
--
2.17.1
Powered by blists - more mailing lists