lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221020182242.503107-2-kamaljit.singh1@wdc.com>
Date:   Thu, 20 Oct 2022 11:22:41 -0700
From:   Kamaljit Singh <kamaljit.singh1@....com>
To:     edumazet@...gle.com, davem@...emloft.net, yoshfuji@...ux-ipv6.org,
        dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com
Cc:     netdev@...r.kernel.org, Niklas.Cassel@....com,
        Damien.LeMoal@....com, kamaljit.singh1@....com
Subject: [PATCH v1 1/2] tcp: Fix for stale host ACK when tgt window shrunk

Under certain congestion conditions, an NVMe/TCP target may be configured
to shrink the TCP window in an effort to slow the sender down prior to
issuing a more drastic L2 pause or PFC indication.  Although the TCP
standard discourages implementations from shrinking the TCP window, it also
states that TCP implementations must be robust to this occurring. The
current Linux TCP layer (in conjunction with the NVMe/TCP host driver) has
an issue when the TCP window is shrunk by a target, which causes ACK frames
to be transmitted with a “stale” SEQ_NUM or for data frames to be
retransmitted by the host. It has been observed that processing of these
“stale” ACKs or data retransmissions impacts NVMe/TCP Write IOPs
performance.

Network traffic analysis revealed that SEQ-NUM being used by the host to
ACK the frame that resized the TCP window had an older SEQ-NUM and not a
value corresponding to the next SEQ-NUM expected on that connection.

In such a case, the kernel was using the seq number calculated by
tcp_wnd_end() as per the code segment below. Since, in this case
tp->snd_wnd=0, tcp_wnd_end(tp) returns tp->snd_una, which is incorrect for
the scenario.  The correct seq number that needs to be returned is
tp->snd_nxt. This fix seems to have fixed the stale SEQ-NUM issue along
with its performance impact.

  1271 static inline u32 tcp_wnd_end(const struct tcp_sock *tp)
  1272 {
  1273   return tp->snd_una + tp->snd_wnd;
  1274 }

Signed-off-by: Kamaljit Singh <kamaljit.singh1@....com>
---
 net/ipv4/tcp_output.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 11aa0ab10bba..322e061edb72 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -100,6 +100,9 @@ static inline __u32 tcp_acceptable_seq(const struct sock *sk)
 	    (tp->rx_opt.wscale_ok &&
 	     ((tp->snd_nxt - tcp_wnd_end(tp)) < (1 << tp->rx_opt.rcv_wscale))))
 		return tp->snd_nxt;
+	else if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
+		 !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)))
+		return tp->snd_nxt;
 	else
 		return tcp_wnd_end(tp);
 }
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ