netdev - [PATCH net] tcp: fix keepalive when data remain undelivered

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210220005447.GA93678@localhost.localdomain>
Date:   Fri, 19 Feb 2021 16:54:47 -0800
From:   Enke Chen <enkechen2020@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Cc:     Neal Cardwell <ncardwell@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>, enkechen2020@...i.com
Subject: [PATCH net] tcp: fix keepalive when data remain undelivered

From: Enke Chen <enchen@...oaltonetworks.com>

TCP keepalive does not timeout under the condition that network connection
is lost and data remain undelivered (incl. retransmit). A very simple
scenarios of the failure is to write data to a tcp socket after the network
connection is lost.

Under the specified condition the keepalive timeout is not evaluated in
the keepalive timer. That is the primary cause of the failure. In addition,
the keepalive probe is not sent out in the keepalive timer. Although packet
retransmit or 0-window probe can serve a similar purpose, they have their
own timers and backoffs that are generally not aligned with the keepalive
parameters for probes and timeout.

As the timing and conditions of the events involved are random, the tcp
keepalive can fail randomly. Given the randomness of the failures, fixing
the issue would not cause any backward compatibility issues. As was well
said, "Determinism is a special case of randomness".

The fix in this patch consists of the following:

  a. Always evaluate the keepalive timeout in the keepalive timer.

  b. Always send out the keepalive probe in the keepalive timer (post the
     keepalive idle time). Given that the keepalive intervals are usually
     in the range of 30 - 60 seconds, there is no need for an optimization
     to further reduce the number of keepalive probes in the presence of
     packet retransmit.

  c. Use the elapsed time (instead of the 0-window probe counter) in
     evaluating tcp keepalive timeout.

Thanks to Eric Dumazet, Neal Cardwell, and Yuchung Cheng for helpful
discussions about the issue and options for fixing it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2 Initial git repository build")
Signed-off-by: Enke Chen <enchen@...oaltonetworks.com>
---
 net/ipv4/tcp_timer.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 4ef08079ccfa..16a044da20db 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t)
 	    ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))
 		goto out;
 
-	elapsed = keepalive_time_when(tp);
-
-	/* It is alive without keepalive 8) */
-	if (tp->packets_out || !tcp_write_queue_empty(sk))
-		goto resched;
-
 	elapsed = keepalive_time_elapsed(tp);
 
 	if (elapsed >= keepalive_time_when(tp)) {
 		/* If the TCP_USER_TIMEOUT option is enabled, use that
 		 * to determine when to timeout instead.
 		 */
-		if ((icsk->icsk_user_timeout != 0 &&
-		    elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&
-		    icsk->icsk_probes_out > 0) ||
-		    (icsk->icsk_user_timeout == 0 &&
-		    icsk->icsk_probes_out >= keepalive_probes(tp))) {
+		u32 timeout = icsk->icsk_user_timeout ?
+		  msecs_to_jiffies(icsk->icsk_user_timeout) :
+		  keepalive_intvl_when(tp) * keepalive_probes(tp) +
+		  keepalive_time_when(tp);
+
+		if (elapsed >= timeout) {
 			tcp_send_active_reset(sk, GFP_ATOMIC);
 			tcp_write_err(sk);
 			goto out;
 		}
 		if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
-			icsk->icsk_probes_out++;
 			elapsed = keepalive_intvl_when(tp);
 		} else {
 			/* If keepalive was lost due to local congestion,
@@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t)
 	}
 
 	sk_mem_reclaim(sk);
-
-resched:
 	inet_csk_reset_keepalive_timer (sk, elapsed);
 	goto out;
 
-- 
2.29.2