netdev - Re: [PATCH v1 1/2] tcp: Fix for stale host ACK when tgt window shrunk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKaEqkcOooXY0EpnBScNXY1HhwwgeZuivQYmN4jxLUcJA@mail.gmail.com>
Date:   Thu, 20 Oct 2022 13:45:05 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Kamaljit Singh <kamaljit.singh1@....com>
Cc:     davem@...emloft.net, yoshfuji@...ux-ipv6.org, dsahern@...nel.org,
        kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org,
        Niklas.Cassel@....com, Damien.LeMoal@....com
Subject: Re: [PATCH v1 1/2] tcp: Fix for stale host ACK when tgt window shrunk

On Thu, Oct 20, 2022 at 11:22 AM Kamaljit Singh <kamaljit.singh1@....com> wrote:
>
> Under certain congestion conditions, an NVMe/TCP target may be configured
> to shrink the TCP window in an effort to slow the sender down prior to
> issuing a more drastic L2 pause or PFC indication.  Although the TCP
> standard discourages implementations from shrinking the TCP window, it also
> states that TCP implementations must be robust to this occurring. The
> current Linux TCP layer (in conjunction with the NVMe/TCP host driver) has
> an issue when the TCP window is shrunk by a target, which causes ACK frames
> to be transmitted with a “stale” SEQ_NUM or for data frames to be
> retransmitted by the host.

Linux sends ACK packets, with a legal SEQ number.

The issue is the receiver of such packets, right ?

Because as you said receivers should be relaxed about this, especially
if _they_ decided
to not respect the TCP standards.

You are proposing to send old ACK, that might be dropped by other stacks.

It has been observed that processing of these
> “stale” ACKs or data retransmissions impacts NVMe/TCP Write IOPs
> performance.
>
> Network traffic analysis revealed that SEQ-NUM being used by the host to
> ACK the frame that resized the TCP window had an older SEQ-NUM and not a
> value corresponding to the next SEQ-NUM expected on that connection.
>
> In such a case, the kernel was using the seq number calculated by
> tcp_wnd_end() as per the code segment below. Since, in this case
> tp->snd_wnd=0, tcp_wnd_end(tp) returns tp->snd_una, which is incorrect for
> the scenario.  The correct seq number that needs to be returned is
> tp->snd_nxt. This fix seems to have fixed the stale SEQ-NUM issue along
> with its performance impact.
>
>   1271 static inline u32 tcp_wnd_end(const struct tcp_sock *tp)
>   1272 {
>   1273   return tp->snd_una + tp->snd_wnd;
>   1274 }
>
> Signed-off-by: Kamaljit Singh <kamaljit.singh1@....com>
> ---
>  net/ipv4/tcp_output.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 11aa0ab10bba..322e061edb72 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -100,6 +100,9 @@ static inline __u32 tcp_acceptable_seq(const struct sock *sk)
>             (tp->rx_opt.wscale_ok &&
>              ((tp->snd_nxt - tcp_wnd_end(tp)) < (1 << tp->rx_opt.rcv_wscale))))
>                 return tp->snd_nxt;
> +       else if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
> +                !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)))
> +               return tp->snd_nxt;
>         else
>                 return tcp_wnd_end(tp);
>  }
> --
> 2.25.1
>