[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a669987a-66c7-b4a1-7c5f-0b2494c4f14a@gmail.com>
Date: Thu, 26 Sep 2019 11:03:57 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Marek Majkowski <marek@...udflare.com>, netdev@...r.kernel.org
Subject: Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries
On 9/26/19 9:57 AM, Eric Dumazet wrote:
>
>
> On 9/26/19 9:46 AM, Eric Dumazet wrote:
>>
>>
>> On 9/26/19 8:05 AM, Eric Dumazet wrote:
>>>
>>>
>>> On 9/25/19 1:46 AM, Marek Majkowski wrote:
>>>> Hello my favorite mailing list!
>>>>
>>>> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
>>>> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
>>>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>>>
>>>> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
>>>> does connect() to a blackholed IP:
>>>>
>>>> $ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
>>>>
>>>> $ sudo python3 user-timeout-and-syn.py
>>>> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>>
>>>> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
>>>> But Linux (5.3.0-rc3) does something weird on the network - it sends
>>>> remaining tcp_syn_retries packets aligned to the 5s mark.
>>>>
>>>> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
>>>> packets on a timeout.
>>>>
>>>> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
>>>> on SYN-SENT state.
>>>>
>>>
>>> Exactly, so far this option has only be used on established flows.
>>>
>>> Feel free to send patches if you need to override the stack behavior
>>> for connection establishment (Same remark for passive side...)
>>
>> Also please take a look at TCP_SYNCNT, which predates TCP_USER_TIMEOUT
>>
>>
>
> I will test the following :
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index dbd9d2d0ee63aa46ad2dda417da6ec9409442b77..1182e51a6b794d75beb8c130354d7804fc83a307 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -220,7 +220,6 @@ static int tcp_write_timeout(struct sock *sk)
> sk_rethink_txhash(sk);
> }
> retry_until = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_syn_retries;
> - expired = icsk->icsk_retransmits >= retry_until;
> } else {
> if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0)) {
> /* Black hole detection */
> @@ -242,9 +241,9 @@ static int tcp_write_timeout(struct sock *sk)
> if (tcp_out_of_resources(sk, do_reset))
> return 1;
> }
> - expired = retransmits_timed_out(sk, retry_until,
> - icsk->icsk_user_timeout);
> }
> + expired = retransmits_timed_out(sk, retry_until,
> + icsk->icsk_user_timeout);
> tcp_fastopen_active_detect_blackhole(sk, expired);
>
> if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
>
The patch works well, but reading again the man page, I see the existing behavior as
been clearly documented.
If we change the behavior, we might break applications that were setting TCP_USER_TIMEOUT
on the listener, expecting the value to b inherited to children at accept() time
but not expecting to change SYNACK rtx behavior.
On the other hand, John Maxell patch (tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy)
has added this weird effect of sending remaining SYN every jiffie
remaining = icsk->icsk_user_timeout - elapsed;
if (remaining <= 0)
return 1; /* user timeout has passed; fire ASAP */
So we probably just should extend TCP_USER_TIMEOUT to SYN_SENT/SYN_RECV states
and change the man page accordingly.
TCP_USER_TIMEOUT (since Linux 2.6.37)
This option takes an unsigned int as an argument. When the value is
greater than 0, it specifies the maximum amount of time in millisec‐
onds that transmitted data may remain unacknowledged before TCP will
forcibly close the corresponding connection and return ETIMEDOUT to
the application. If the option value is specified as 0, TCP will to
use the system default.
Increasing user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing user timeouts
allows applications to "fail fast", if so desired. Otherwise, fail‐
ure may take up to 20 minutes with the current system defaults in a
normal WAN environment.
This option can be set during any state of a TCP connection, but is
effective only during the synchronized states of a connection (ESTAB‐
LISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will override keepalive to determine when to close a
connection due to keepalive failure.
The option has no effect on when TCP retransmits a packet, nor when a
keepalive probe is sent.
This option, like many others, will be inherited by the socket re‐
turned by accept(2), if it was set on the listening socket.
Further details on the user timeout feature can be found in RFC 793
and RFC 5482 ("TCP User Timeout Option").
Powered by blists - more mailing lists