[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd51b146-52b8-c56b-8efe-0e0cb73ee6c4@akamai.com>
Date: Mon, 21 Oct 2019 14:27:24 -0400
From: Jason Baron <jbaron@...mai.com>
To: Yuchung Cheng <ycheng@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>
Cc: David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Netdev <netdev@...r.kernel.org>,
Christoph Paasch <cpaasch@...le.com>
Subject: Re: [net-next] tcp: add TCP_INFO status for failed client TFO
On 10/21/19 2:02 PM, Yuchung Cheng wrote:
> Thanks for the patch. Detailed comments below
>
> On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell <ncardwell@...gle.com> wrote:
>>
>> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jbaron@...mai.com> wrote:
>>>
>>> The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
>>> or not data-in-SYN was ack'd on both the client and server side. We'd like
>>> to gather more information on the client-side in the failure case in order
>>> to indicate the reason for the failure. This can be useful for not only
>>> debugging TFO, but also for creating TFO socket policies. For example, if
>>> a middle box removes the TFO option or drops a data-in-SYN, we can
>>> can detect this case, and turn off TFO for these connections saving the
>>> extra retransmits.
>>>
>>> The newly added tcpi_fastopen_client_fail status is 2 bits and has 4
>>> states:
>>>
>>> 1) TFO_STATUS_UNSPEC
>>>
>>> catch-all.
>>>
>>> 2) TFO_NO_COOKIE_SENT
>>>
>>> If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
>>> was sent because we don't have one yet, its not in cache or black-holing
>>> may be enabled (already indicated by the global
>>> LINUX_MIB_TCPFASTOPENBLACKHOLE).
>
> It'd be useful to separate the two that cookie is available but is
> prohibited to use due to BH checking. We've seen users internally get
> confused due to lack of this info (after seeing cookies from ip
> metrics).
>
ok, yeah i had been thinking about splitting these out but thought that
the LINUX_MIB_TCPFASTOPENBLACKHOLE counter could help differentiate
these cases - but I'm ok making it explicit.
>>>
>>> 3) TFO_NO_SYN_DATA
>>>
>>> Data was sent with SYN, we received a SYN/ACK but it did not cover the data
>>> portion. Cookie is not accepted by server because the cookie may be invalid
>>> or the server may be overloaded.
>>>
>>>
>>> 4) TFO_NO_SYN_DATA_TIMEOUT
>>>
>>> Data was sent with SYN, we received a SYN/ACK which did not cover the data
>>> after at least 1 additional SYN was sent (without data). It may be the case
>>> that a middle-box is dropping data-in-SYN packets. Thus, it would be more
>>> efficient to not use TFO on this connection to avoid extra retransmits
>>> during connection establishment.
>>>
>>> These new fields certainly not cover all the cases where TFO may fail, but
>>> other failures, such as SYN/ACK + data being dropped, will result in the
>>> connection not becoming established. And a connection blackhole after
>>> session establishment shows up as a stalled connection.
>>>
>>> Signed-off-by: Jason Baron <jbaron@...mai.com>
>>> Cc: Eric Dumazet <edumazet@...gle.com>
>>> Cc: Neal Cardwell <ncardwell@...gle.com>
>>> Cc: Christoph Paasch <cpaasch@...le.com>
>>> ---
>>
>> Thanks for adding this!
>>
>> It would be good to reset tp->fastopen_client_fail to 0 in tcp_disconnect().
>>
>>> +/* why fastopen failed from client perspective */
>>> +enum tcp_fastopen_client_fail {
>>> + TFO_STATUS_UNSPEC, /* catch-all */
>>> + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */
>>> + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */
>>
>> I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to
>> me like this means the client didn't send a SYN+DATA. What about
>> "TFO_DATA_NOT_ACKED", or something like that?
>>
>> If you don't mind, it would be great to cc: Yuchung on the next rev.
> TFO_DATA_NOT_ACKED is already available from the inverse of TCPI_OPT_SYN_DATA
> #define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */
>
> It occurs (3)(4) are already available indirectly from
> TCPI_OPT_SYN_DATA and tcpi_total_retrans together, but the socket must
> query tcpi_total_retrans right after connect/sendto returns which may
> not be preferred.
>
> How about an alternative proposal to the types to catch more TFO issues:
>
> TFO_STATUS_UNSPEC
> TFO_DISABLED_BLACKHOLE_DETECTED
> TFO_COOKIE_UNAVAILABLE
> TFO_SYN_RETRANSMITTED // use in conjunction w/ TCPI_OPT_SYN_DATA for (3)(4)
Ok, that set works for me. I will re-spin with these states for v2.
Thanks for the suggestion!
-Jason
Powered by blists - more mailing lists