[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKaEqkcOooXY0EpnBScNXY1HhwwgeZuivQYmN4jxLUcJA@mail.gmail.com>
Date: Thu, 20 Oct 2022 13:45:05 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Kamaljit Singh <kamaljit.singh1@....com>
Cc: davem@...emloft.net, yoshfuji@...ux-ipv6.org, dsahern@...nel.org,
kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org,
Niklas.Cassel@....com, Damien.LeMoal@....com
Subject: Re: [PATCH v1 1/2] tcp: Fix for stale host ACK when tgt window shrunk
On Thu, Oct 20, 2022 at 11:22 AM Kamaljit Singh <kamaljit.singh1@....com> wrote:
>
> Under certain congestion conditions, an NVMe/TCP target may be configured
> to shrink the TCP window in an effort to slow the sender down prior to
> issuing a more drastic L2 pause or PFC indication. Although the TCP
> standard discourages implementations from shrinking the TCP window, it also
> states that TCP implementations must be robust to this occurring. The
> current Linux TCP layer (in conjunction with the NVMe/TCP host driver) has
> an issue when the TCP window is shrunk by a target, which causes ACK frames
> to be transmitted with a “stale” SEQ_NUM or for data frames to be
> retransmitted by the host.
Linux sends ACK packets, with a legal SEQ number.
The issue is the receiver of such packets, right ?
Because as you said receivers should be relaxed about this, especially
if _they_ decided
to not respect the TCP standards.
You are proposing to send old ACK, that might be dropped by other stacks.
It has been observed that processing of these
> “stale” ACKs or data retransmissions impacts NVMe/TCP Write IOPs
> performance.
>
> Network traffic analysis revealed that SEQ-NUM being used by the host to
> ACK the frame that resized the TCP window had an older SEQ-NUM and not a
> value corresponding to the next SEQ-NUM expected on that connection.
>
> In such a case, the kernel was using the seq number calculated by
> tcp_wnd_end() as per the code segment below. Since, in this case
> tp->snd_wnd=0, tcp_wnd_end(tp) returns tp->snd_una, which is incorrect for
> the scenario. The correct seq number that needs to be returned is
> tp->snd_nxt. This fix seems to have fixed the stale SEQ-NUM issue along
> with its performance impact.
>
> 1271 static inline u32 tcp_wnd_end(const struct tcp_sock *tp)
> 1272 {
> 1273 return tp->snd_una + tp->snd_wnd;
> 1274 }
>
> Signed-off-by: Kamaljit Singh <kamaljit.singh1@....com>
> ---
> net/ipv4/tcp_output.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 11aa0ab10bba..322e061edb72 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -100,6 +100,9 @@ static inline __u32 tcp_acceptable_seq(const struct sock *sk)
> (tp->rx_opt.wscale_ok &&
> ((tp->snd_nxt - tcp_wnd_end(tp)) < (1 << tp->rx_opt.rcv_wscale))))
> return tp->snd_nxt;
> + else if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
> + !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)))
> + return tp->snd_nxt;
> else
> return tcp_wnd_end(tp);
> }
> --
> 2.25.1
>
Powered by blists - more mailing lists