lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140616153515.GA4907@unicorn.suse.cz>
Date:	Mon, 16 Jun 2014 17:35:15 +0200
From:	Michal Kubecek <mkubecek@...e.cz>
To:	netdev@...r.kernel.org
Subject: tcp: multiple ssthresh reductions before all packets are
 retransmitted

Hello,

while investigating a problem with TCP loss recovery, I noticed that if
large window is lost and another loss happens before the recovery is
finished, ssthresh can drop to very low values. This is especially
harmful with Reno congestion control where cwnd growth in congestion
avoidance phase is linear and rather slow. This is 100% reproducible
on older (3.0) kernels but I was able to reproduce it on 3.15 as well.

RFC 5681 says that ssthresh reduction in response to RTO should be done
only once and should not be repeated until all packets from the first
loss are retransmitted. RFC 6582 (as well as its predecessor RFC 3782)
is even more specific and says that when loss is detected, one should
mark current SND.NXT and ssthresh shouldn't be reduced again due to
a loss until SND.UNA reaches this remembered value.

Linux implementation does exactly that but in TCP_CA_Loss state,
tcp_enter_loss() also takes icsk_retransmits into account:

	/* Reduce ssthresh if it has not yet been made inside this window. */
	if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
	    !after(tp->high_seq, tp->snd_una) ||
	    (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
		new_recovery = true;
		tp->prior_ssthresh = tcp_current_ssthresh(sk);
		tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
		tcp_ca_event(sk, CA_EVENT_LOSS);
	}

This seems correct as icsk_retransmits is supposed to mean "we still
have packets to retransmit". But icsk_retransmits is reset in
tcp_process_loss() if one of two conditions is satisfied:

	bool recovered = !before(tp->snd_una, tp->high_seq);
...
	if (recovered) {
		/* F-RTO RFC5682 sec 3.1 step 2.a and 1st part of step 3.a */
		icsk->icsk_retransmits = 0;
		tcp_try_undo_recovery(sk);
		return;
	}
	if (flag & FLAG_DATA_ACKED)
		icsk->icsk_retransmits = 0;

First part implements the condition from RFC. But the second part can
reset icsk_retransmits much earlier than all previously lost data are
retransmitted so that ssthresh can be reduced more than once.

The first is recent and comes from commits ab42d9ee3 ("tcp: refactor
CA_Loss state processing") and e33099f96 ("tcp: implement RFC5682
F-RTO"). Second test is much older and predates the start of git tree.

I believe this code is actually wrong and we should remove those two
lines:

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 40661fc..f534723 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2710,8 +2710,6 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack)
 		tcp_try_undo_recovery(sk);
 		return;
 	}
-	if (flag & FLAG_DATA_ACKED)
-		icsk->icsk_retransmits = 0;
 	if (tcp_is_reno(tp)) {
 		/* A Reno DUPACK means new data in F-RTO step 2.b above are
 		 * delivered. Lower inflight to clock out (re)tranmissions.

But as I don't know where do they come from, I can't be sure they don't
serve some purpose so I wanted to ask first if someone knows the logic
behind them.

Thanks in advance,
Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ