linux-kernel - Re: [PATCH net] tcp: fix keepalive when data remain undelivered

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAK6E8=err50_ob6r99eXw++0D84yJe66Lzsz6F7KWbU5=ii_ew@mail.gmail.com>
Date:   Fri, 19 Feb 2021 18:24:00 -0800
From:   Yuchung Cheng <ycheng@...gle.com>
To:     Enke Chen <enkechen2020@...il.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Neal Cardwell <ncardwell@...gle.com>, enkechen2020@...i.com
Subject: Re: [PATCH net] tcp: fix keepalive when data remain undelivered

On Fri, Feb 19, 2021 at 4:54 PM Enke Chen <enkechen2020@...il.com> wrote:
>
> From: Enke Chen <enchen@...oaltonetworks.com>
>
> TCP keepalive does not timeout under the condition that network connection
> is lost and data remain undelivered (incl. retransmit). A very simple
> scenarios of the failure is to write data to a tcp socket after the network
> connection is lost.

AFAIK current Linux TCP KA implementation does not violate the
RFC793bis (Section 3.7.4 Keep-Alives)
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-20#section-3.7.4

We have even raised that in IETF tcpm list to get more clarity
https://mailarchive.ietf.org/arch/msg/tcpm/KxcEsLtDuDNhcP8UjbyPkJqpzsE/

I believe you interpret the keep-alive differently -- so this is
arguably a subjective "bug-fix". As Neal and I have expressed in our
private communications, current Linux KA has been implemented for more
than a decade. Changing this behavior may break many existing
applications even if it may benefit certain.

>
> Under the specified condition the keepalive timeout is not evaluated in
> the keepalive timer. That is the primary cause of the failure. In addition,
> the keepalive probe is not sent out in the keepalive timer. Although packet
> retransmit or 0-window probe can serve a similar purpose, they have their
> own timers and backoffs that are generally not aligned with the keepalive
> parameters for probes and timeout.
>
> As the timing and conditions of the events involved are random, the tcp
> keepalive can fail randomly. Given the randomness of the failures, fixing
> the issue would not cause any backward compatibility issues. As was well
> said, "Determinism is a special case of randomness".
>
> The fix in this patch consists of the following:
>
>   a. Always evaluate the keepalive timeout in the keepalive timer.
>
>   b. Always send out the keepalive probe in the keepalive timer (post the
>      keepalive idle time). Given that the keepalive intervals are usually
>      in the range of 30 - 60 seconds, there is no need for an optimization
>      to further reduce the number of keepalive probes in the presence of
>      packet retransmit.
>
>   c. Use the elapsed time (instead of the 0-window probe counter) in
>      evaluating tcp keepalive timeout.
>
> Thanks to Eric Dumazet, Neal Cardwell, and Yuchung Cheng for helpful
> discussions about the issue and options for fixing it.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2 Initial git repository build")
> Signed-off-by: Enke Chen <enchen@...oaltonetworks.com>
> ---
>  net/ipv4/tcp_timer.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 4ef08079ccfa..16a044da20db 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t)
>             ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))
>                 goto out;
>
> -       elapsed = keepalive_time_when(tp);
> -
> -       /* It is alive without keepalive 8) */
> -       if (tp->packets_out || !tcp_write_queue_empty(sk))
> -               goto resched;
> -
>         elapsed = keepalive_time_elapsed(tp);
>
>         if (elapsed >= keepalive_time_when(tp)) {
>                 /* If the TCP_USER_TIMEOUT option is enabled, use that
>                  * to determine when to timeout instead.
>                  */
> -               if ((icsk->icsk_user_timeout != 0 &&
> -                   elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&
> -                   icsk->icsk_probes_out > 0) ||
> -                   (icsk->icsk_user_timeout == 0 &&
> -                   icsk->icsk_probes_out >= keepalive_probes(tp))) {
> +               u32 timeout = icsk->icsk_user_timeout ?
> +                 msecs_to_jiffies(icsk->icsk_user_timeout) :
> +                 keepalive_intvl_when(tp) * keepalive_probes(tp) +
> +                 keepalive_time_when(tp);
> +
> +               if (elapsed >= timeout) {
>                         tcp_send_active_reset(sk, GFP_ATOMIC);
>                         tcp_write_err(sk);
>                         goto out;
>                 }
>                 if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
> -                       icsk->icsk_probes_out++;
>                         elapsed = keepalive_intvl_when(tp);
>                 } else {
>                         /* If keepalive was lost due to local congestion,
> @@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t)
>         }
>
>         sk_mem_reclaim(sk);
> -
> -resched:
>         inet_csk_reset_keepalive_timer (sk, elapsed);
>         goto out;
>
> --
> 2.29.2
>