lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1520936711-16784-6-git-send-email-ilpo.jarvinen@helsinki.fi>
Date:   Tue, 13 Mar 2018 12:25:11 +0200
From:   Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
To:     netdev@...r.kernel.org
Cc:     Yuchung Cheng <ycheng@...gle.com>,
        Neal Cardwell <ncardwell@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Sergei Shtylyov <sergei.shtylyov@...entembedded.com>
Subject: [PATCH v3 net 5/5] tcp: send real dupACKs by locking advertized window for non-SACK flows

Currently, the TCP code is overly eager to increase window on
almost every ACK. It makes those ACKs that the receiver should
sent as dupACKs look like they update window that is not
considered a real dupACK by the non-SACK sender-side code.
Therefore the sender needs to resort to RTO to recover
losses as fast retransmit/fast recovery cannot be triggered
by such masked duplicate ACKs.

This change makes sure that when an ofo segment is received,
no change to window is applied if we are going to send a dupACK.
Even with this change, the window updates keep being maintained
but that occurs "behind the scenes". That is, this change does
not interfere with memory management of the flow which could
have long-term impact for the progress of the flow but only
prevents those updates being seen on the wire on short-term.
It's ok to change the window for non-dupACKs such as the first
ACK after ofo arrivals start if that ACK was using delayed ACKs
and also whenever the ack field advances. As ack field typically
advances once per RTT as the first hole is retransmitted, the
window updates are not delayed entirely during long recoveries.

Even before this fix, tcp_select_window did not allow ACK
shrinking the window for duplicate ACKs (that was previously
even called "treason" but the old charmy message is gone now).
The advertized window can only shrink when also ack field
changes which will not be blocked by this change as it is not
duplicate ACK.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
---
 include/linux/tcp.h   |  3 ++-
 net/ipv4/tcp_input.c  |  5 ++++-
 net/ipv4/tcp_output.c | 43 +++++++++++++++++++++++++------------------
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 8f4c549..e239662 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -225,7 +225,8 @@ struct tcp_sock {
 		fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
 		fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */
 		is_sack_reneg:1,    /* in recovery from loss with SACK reneg? */
-		unused:2;
+		dupack_wnd_lock :1, /* Non-SACK constant rwnd dupacks needed? */
+		unused:1;
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
 		unused1	    : 1,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 270aa48..4ff192b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4626,6 +4626,7 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	bool fragstolen;
 	int eaten;
 
@@ -4669,7 +4670,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 			 * gap in queue is filled.
 			 */
 			if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
-				inet_csk(sk)->icsk_ack.pingpong = 0;
+				icsk->icsk_ack.pingpong = 0;
 		}
 
 		if (tp->rx_opt.num_sacks)
@@ -4719,6 +4720,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		goto queue_and_out;
 	}
 
+	if (tcp_is_reno(tp) && !(icsk->icsk_ack.pending & ICSK_ACK_TIMER))
+		tp->dupack_wnd_lock = 1;
 	tcp_data_queue_ofo(sk, skb);
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6818042..45fbe92 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -249,25 +249,32 @@ static u16 tcp_select_window(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 old_win = tp->rcv_wnd;
-	u32 cur_win = tcp_receive_window(tp);
-	u32 new_win = __tcp_select_window(sk);
-
-	/* Never shrink the offered window */
-	if (new_win < cur_win) {
-		/* Danger Will Robinson!
-		 * Don't update rcv_wup/rcv_wnd here or else
-		 * we will not be able to advertise a zero
-		 * window in time.  --DaveM
-		 *
-		 * Relax Will Robinson.
-		 */
-		if (new_win == 0)
-			NET_INC_STATS(sock_net(sk),
-				      LINUX_MIB_TCPWANTZEROWINDOWADV);
-		new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
+	u32 cur_win;
+	u32 new_win;
+
+	if (tp->dupack_wnd_lock) {
+		new_win = old_win;
+		tp->dupack_wnd_lock = 0;
+	} else {
+		cur_win = tcp_receive_window(tp);
+		new_win = __tcp_select_window(sk);
+		/* Never shrink the offered window */
+		if (new_win < cur_win) {
+			/* Danger Will Robinson!
+			 * Don't update rcv_wup/rcv_wnd here or else
+			 * we will not be able to advertise a zero
+			 * window in time.  --DaveM
+			 *
+			 * Relax Will Robinson.
+			 */
+			if (new_win == 0)
+				NET_INC_STATS(sock_net(sk),
+					      LINUX_MIB_TCPWANTZEROWINDOWADV);
+			new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
+		}
+		tp->rcv_wnd = new_win;
+		tp->rcv_wup = tp->rcv_nxt;
 	}
-	tp->rcv_wnd = new_win;
-	tp->rcv_wup = tp->rcv_nxt;
 
 	/* Make sure we do not exceed the maximum possible
 	 * scaled window.
-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ