[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221020182242.503107-2-kamaljit.singh1@wdc.com>
Date: Thu, 20 Oct 2022 11:22:41 -0700
From: Kamaljit Singh <kamaljit.singh1@....com>
To: edumazet@...gle.com, davem@...emloft.net, yoshfuji@...ux-ipv6.org,
dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com
Cc: netdev@...r.kernel.org, Niklas.Cassel@....com,
Damien.LeMoal@....com, kamaljit.singh1@....com
Subject: [PATCH v1 1/2] tcp: Fix for stale host ACK when tgt window shrunk
Under certain congestion conditions, an NVMe/TCP target may be configured
to shrink the TCP window in an effort to slow the sender down prior to
issuing a more drastic L2 pause or PFC indication. Although the TCP
standard discourages implementations from shrinking the TCP window, it also
states that TCP implementations must be robust to this occurring. The
current Linux TCP layer (in conjunction with the NVMe/TCP host driver) has
an issue when the TCP window is shrunk by a target, which causes ACK frames
to be transmitted with a “stale” SEQ_NUM or for data frames to be
retransmitted by the host. It has been observed that processing of these
“stale” ACKs or data retransmissions impacts NVMe/TCP Write IOPs
performance.
Network traffic analysis revealed that SEQ-NUM being used by the host to
ACK the frame that resized the TCP window had an older SEQ-NUM and not a
value corresponding to the next SEQ-NUM expected on that connection.
In such a case, the kernel was using the seq number calculated by
tcp_wnd_end() as per the code segment below. Since, in this case
tp->snd_wnd=0, tcp_wnd_end(tp) returns tp->snd_una, which is incorrect for
the scenario. The correct seq number that needs to be returned is
tp->snd_nxt. This fix seems to have fixed the stale SEQ-NUM issue along
with its performance impact.
1271 static inline u32 tcp_wnd_end(const struct tcp_sock *tp)
1272 {
1273 return tp->snd_una + tp->snd_wnd;
1274 }
Signed-off-by: Kamaljit Singh <kamaljit.singh1@....com>
---
net/ipv4/tcp_output.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 11aa0ab10bba..322e061edb72 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -100,6 +100,9 @@ static inline __u32 tcp_acceptable_seq(const struct sock *sk)
(tp->rx_opt.wscale_ok &&
((tp->snd_nxt - tcp_wnd_end(tp)) < (1 << tp->rx_opt.rcv_wscale))))
return tp->snd_nxt;
+ else if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
+ !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)))
+ return tp->snd_nxt;
else
return tcp_wnd_end(tp);
}
--
2.25.1
Powered by blists - more mailing lists