lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <fc8b00b9978b4f956fa705badfaa138854abf919.1319595687.git.luto@amacapital.net>
Date:	Tue, 25 Oct 2011 19:25:27 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	netdev@...r.kernel.org
Cc:	Andy Lutomirski <luto@...capital.net>
Subject: [PATCH] Add TCP_NO_DELAYED_ACK socket option

When talking to an unfixable interactive peer that fails to set
TCP_NODELAY, disabling delayed ACKs can help mitigate the problem.
This is an evil thing to do, but if the entire network is private,
it's not that evil.

This works around a problem with the remote *application*, so make
it a socket option instead of a sysctl or a per-route option.

Signed-off-by: Andy Lutomirski <luto@...capital.net>
---

This patch is a bit embarrassing.  We talk to remote applications over
TCP that are very much interactive but don't set TCP_NODELAY.  These
applications apparently cannot be fixed.  As a partial workaround, if we
ACK every incoming segment, then as long as they don't transmit two
segments per rtt, we do pretty well.

Windows can do something similar, but it's per interface instead of per
socket:

http://support.microsoft.com/kb/328890

 include/linux/tcp.h                |    1 +
 include/net/inet_connection_sock.h |    3 ++-
 net/ipv4/tcp.c                     |   11 +++++++++++
 net/ipv4/tcp_input.c               |    3 ++-
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 531ede8..2116f31 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -106,6 +106,7 @@ enum {
 #define TCP_THIN_LINEAR_TIMEOUTS 16      /* Use linear timeouts for thin streams*/
 #define TCP_THIN_DUPACK         17      /* Fast retrans. after 1 dupack */
 #define TCP_USER_TIMEOUT	18	/* How long for loss retry before timeout */
+#define TCP_NO_DELAYED_ACK	19	/* Do not delay ACKs.  */
 
 /* for TCP_INFO socket option */
 #define TCPI_OPT_TIMESTAMPS	1
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index e6db62e..1ad91bf 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -106,8 +106,9 @@ struct inet_connection_sock {
 	struct {
 		__u8		  pending;	 /* ACK is pending			   */
 		__u8		  quick;	 /* Scheduled number of quick acks	   */
-		__u8		  pingpong;	 /* The session is interactive		   */
 		__u8		  blocked;	 /* Delayed ACK was blocked by socket lock */
+		__u8		  pingpong:1;	 /* The session is interactive		   */
+		__u8		  nodelack:1;	 /* Delayed ACKs are disabled		   */
 		__u32		  ato;		 /* Predicted tick of soft clock	   */
 		unsigned long	  timeout;	 /* Currently scheduled timeout		   */
 		__u32		  lrcvtime;	 /* timestamp of last received data packet */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 46febca..e8e98dc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2385,6 +2385,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		}
 		break;
 
+	case TCP_NO_DELAYED_ACK:
+		if (val == 0 || val == 1)
+			icsk->icsk_ack.nodelack = !!val;
+		else
+			err = -EINVAL;
+		break;
+
 #ifdef CONFIG_TCP_MD5SIG
 	case TCP_MD5SIG:
 		/* Read the IP->Key mappings from userspace */
@@ -2564,6 +2571,10 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		val = !icsk->icsk_ack.pingpong;
 		break;
 
+	case TCP_NO_DELAYED_ACK:
+		val = icsk->icsk_ack.nodelack;
+		break;
+
 	case TCP_CONGESTION:
 		if (get_user(len, optlen))
 			return -EFAULT;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 21fab3e..e7d7ee0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -197,7 +197,8 @@ static void tcp_enter_quickack_mode(struct sock *sk)
 static inline int tcp_in_quickack_mode(const struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
-	return icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong;
+	return (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong) ||
+		icsk->icsk_ack.nodelack;
 }
 
 static inline void TCP_ECN_queue_cwr(struct tcp_sock *tp)
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ