netdev - [PATCH net-next 7/8] tcp: retry more conservatively on local congestion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190116230535.162758-8-ycheng@google.com>
Date:   Wed, 16 Jan 2019 15:05:34 -0800
From:   Yuchung Cheng <ycheng@...gle.com>
To:     davem@...emloft.net, edumazet@...gle.com
Cc:     netdev@...r.kernel.org, ncardwell@...gle.com, soheil@...gle.com,
        Yuchung Cheng <ycheng@...gle.com>
Subject: [PATCH net-next 7/8] tcp: retry more conservatively on local congestion

Previously when the sender fails to retransmit a data packet on
timeout due to congestion in the local host (e.g. throttling in
qdisc), it'll retry within an RTO up to 500ms.

In low-RTT networks such as data-centers, RTO is often far
below the default minimum 200ms (and the cap 500ms). Then local
host congestion could trigger a retry storm pouring gas to the
fire. Worse yet, the retry counter (icsk_retransmits) is not
properly updated so the aggressive retry may exceed the system
limit (15 rounds) until the packet finally slips through.

On such rare events, it's wise to retry more conservatively (500ms)
and update the stats properly to reflect these incidents and follow
the system limit. Note that this is consistent with the behavior
when a keep-alive probe is dropped due to local congestion.

Signed-off-by: Yuchung Cheng <ycheng@...gle.com>
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Reviewed-by: Neal Cardwell <ncardwell@...gle.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@...gle.com>
---
 net/ipv4/tcp_timer.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index c36089aa3515..d7399a89469d 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -500,14 +500,13 @@ void tcp_retransmit_timer(struct sock *sk)

 	tcp_enter_loss(sk);

+	icsk->icsk_retransmits++;
 	if (tcp_retransmit_skb(sk, tcp_rtx_queue_head(sk), 1) > 0) {
 		/* Retransmission failed because of local congestion,
-		 * do not backoff.
+		 * Let senders fight for local resources conservatively.
 		 */
-		if (!icsk->icsk_retransmits)
-			icsk->icsk_retransmits = 1;
 		inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-					  min(icsk->icsk_rto, TCP_RESOURCE_PROBE_INTERVAL),
+					  TCP_RESOURCE_PROBE_INTERVAL,
 					  TCP_RTO_MAX);
 		goto out;
 	}
@@ -528,7 +527,6 @@ void tcp_retransmit_timer(struct sock *sk)
 	 * the 120 second clamps though!
 	 */
 	icsk->icsk_backoff++;
-	icsk->icsk_retransmits++;

 out_reset_timer:
 	/* If stream is thin, use linear timeouts. Since 'icsk_backoff' is
-- 
2.20.1.97.g81188d93c3-goog