netdev - Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a669987a-66c7-b4a1-7c5f-0b2494c4f14a@gmail.com>
Date:   Thu, 26 Sep 2019 11:03:57 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Marek Majkowski <marek@...udflare.com>, netdev@...r.kernel.org
Subject: Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries



On 9/26/19 9:57 AM, Eric Dumazet wrote:
> 
> 
> On 9/26/19 9:46 AM, Eric Dumazet wrote:
>>
>>
>> On 9/26/19 8:05 AM, Eric Dumazet wrote:
>>>
>>>
>>> On 9/25/19 1:46 AM, Marek Majkowski wrote:
>>>> Hello my favorite mailing list!
>>>>
>>>> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
>>>> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
>>>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>>>
>>>> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
>>>> does connect() to a blackholed IP:
>>>>
>>>> $ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
>>>>
>>>> $ sudo python3 user-timeout-and-syn.py
>>>> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>>
>>>> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
>>>> But Linux (5.3.0-rc3) does something weird on the network - it sends
>>>> remaining tcp_syn_retries packets aligned to the 5s mark.
>>>>
>>>> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
>>>> packets on a timeout.
>>>>
>>>> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
>>>> on SYN-SENT state.
>>>>
>>>
>>> Exactly, so far this option has only be used on established flows.
>>>
>>> Feel free to send patches if you need to override the stack behavior
>>> for connection establishment (Same remark for passive side...)
>>
>> Also please take a look at TCP_SYNCNT,  which predates TCP_USER_TIMEOUT
>>
>>
> 
> I will test the following :
> 
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index dbd9d2d0ee63aa46ad2dda417da6ec9409442b77..1182e51a6b794d75beb8c130354d7804fc83a307 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -220,7 +220,6 @@ static int tcp_write_timeout(struct sock *sk)
>                         sk_rethink_txhash(sk);
>                 }
>                 retry_until = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_syn_retries;
> -               expired = icsk->icsk_retransmits >= retry_until;
>         } else {
>                 if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0)) {
>                         /* Black hole detection */
> @@ -242,9 +241,9 @@ static int tcp_write_timeout(struct sock *sk)
>                         if (tcp_out_of_resources(sk, do_reset))
>                                 return 1;
>                 }
> -               expired = retransmits_timed_out(sk, retry_until,
> -                                               icsk->icsk_user_timeout);
>         }
> +       expired = retransmits_timed_out(sk, retry_until,
> +                                       icsk->icsk_user_timeout);
>         tcp_fastopen_active_detect_blackhole(sk, expired);
>  
>         if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
> 

The patch works well, but reading again the man page, I see the existing behavior as
been clearly documented.

If we change the behavior, we might break applications that were setting TCP_USER_TIMEOUT
on the listener, expecting the value to b inherited to children at accept() time
but not expecting to change SYNACK rtx behavior.

On the other hand, John Maxell patch (tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy)
has added this weird effect of sending remaining SYN every jiffie


     remaining = icsk->icsk_user_timeout - elapsed;
     if (remaining <= 0)
         return 1; /* user timeout has passed; fire ASAP */ 

So we probably just should extend TCP_USER_TIMEOUT to SYN_SENT/SYN_RECV states
and change the man page accordingly. 



       TCP_USER_TIMEOUT (since Linux 2.6.37)
              This  option takes an unsigned int as an argument.  When the value is
              greater than 0, it specifies the maximum amount of time in  millisec‐
              onds  that transmitted data may remain unacknowledged before TCP will
              forcibly close the corresponding connection and return  ETIMEDOUT  to
              the  application.  If the option value is specified as 0, TCP will to
              use the system default.

              Increasing user timeouts allows a TCP connection to survive  extended
              periods  without  end-to-end  connectivity.  Decreasing user timeouts
              allows applications to "fail fast", if so desired.  Otherwise,  fail‐
              ure  may  take up to 20 minutes with the current system defaults in a
              normal WAN environment.

              This option can be set during any state of a TCP connection,  but  is
              effective only during the synchronized states of a connection (ESTAB‐
              LISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING,  and  LAST-ACK).
              Moreover,  when  used  with  the TCP keepalive (SO_KEEPALIVE) option,
              TCP_USER_TIMEOUT will override keepalive to determine when to close a
              connection due to keepalive failure.

              The option has no effect on when TCP retransmits a packet, nor when a
              keepalive probe is sent.

              This option, like many others, will be inherited by  the  socket  re‐
              turned by accept(2), if it was set on the listening socket.

              Further  details  on the user timeout feature can be found in RFC 793
              and RFC 5482 ("TCP User Timeout Option").