[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20191021195150.GA7514@MacBook-Pro-64.local>
Date: Mon, 21 Oct 2019 12:51:50 -0700
From: Christoph Paasch <cpaasch@...le.com>
To: Jason Baron <jbaron@...mai.com>
Cc: Yuchung Cheng <ycheng@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Netdev <netdev@...r.kernel.org>
Subject: Re: [net-next] tcp: add TCP_INFO status for failed client TFO
On 21/10/19 - 14:27:24, Jason Baron wrote:
>
>
> On 10/21/19 2:02 PM, Yuchung Cheng wrote:
> > Thanks for the patch. Detailed comments below
> >
> > On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> >>
> >> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jbaron@...mai.com> wrote:
> >>>
> >>> The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
> >>> or not data-in-SYN was ack'd on both the client and server side. We'd like
> >>> to gather more information on the client-side in the failure case in order
> >>> to indicate the reason for the failure. This can be useful for not only
> >>> debugging TFO, but also for creating TFO socket policies. For example, if
> >>> a middle box removes the TFO option or drops a data-in-SYN, we can
> >>> can detect this case, and turn off TFO for these connections saving the
> >>> extra retransmits.
> >>>
> >>> The newly added tcpi_fastopen_client_fail status is 2 bits and has 4
> >>> states:
> >>>
> >>> 1) TFO_STATUS_UNSPEC
> >>>
> >>> catch-all.
> >>>
> >>> 2) TFO_NO_COOKIE_SENT
> >>>
> >>> If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
> >>> was sent because we don't have one yet, its not in cache or black-holing
> >>> may be enabled (already indicated by the global
> >>> LINUX_MIB_TCPFASTOPENBLACKHOLE).
> >
> > It'd be useful to separate the two that cookie is available but is
> > prohibited to use due to BH checking. We've seen users internally get
> > confused due to lack of this info (after seeing cookies from ip
> > metrics).
> >
>
> ok, yeah i had been thinking about splitting these out but thought that
> the LINUX_MIB_TCPFASTOPENBLACKHOLE counter could help differentiate
> these cases - but I'm ok making it explicit.
>
> >>>
> >>> 3) TFO_NO_SYN_DATA
> >>>
> >>> Data was sent with SYN, we received a SYN/ACK but it did not cover the data
> >>> portion. Cookie is not accepted by server because the cookie may be invalid
> >>> or the server may be overloaded.
> >>>
> >>>
> >>> 4) TFO_NO_SYN_DATA_TIMEOUT
> >>>
> >>> Data was sent with SYN, we received a SYN/ACK which did not cover the data
> >>> after at least 1 additional SYN was sent (without data). It may be the case
> >>> that a middle-box is dropping data-in-SYN packets. Thus, it would be more
> >>> efficient to not use TFO on this connection to avoid extra retransmits
> >>> during connection establishment.
> >>>
> >>> These new fields certainly not cover all the cases where TFO may fail, but
> >>> other failures, such as SYN/ACK + data being dropped, will result in the
> >>> connection not becoming established. And a connection blackhole after
> >>> session establishment shows up as a stalled connection.
> >>>
> >>> Signed-off-by: Jason Baron <jbaron@...mai.com>
> >>> Cc: Eric Dumazet <edumazet@...gle.com>
> >>> Cc: Neal Cardwell <ncardwell@...gle.com>
> >>> Cc: Christoph Paasch <cpaasch@...le.com>
> >>> ---
> >>
> >> Thanks for adding this!
> >>
> >> It would be good to reset tp->fastopen_client_fail to 0 in tcp_disconnect().
> >>
> >>> +/* why fastopen failed from client perspective */
> >>> +enum tcp_fastopen_client_fail {
> >>> + TFO_STATUS_UNSPEC, /* catch-all */
> >>> + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */
> >>> + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */
> >>
> >> I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to
> >> me like this means the client didn't send a SYN+DATA. What about
> >> "TFO_DATA_NOT_ACKED", or something like that?
> >>
> >> If you don't mind, it would be great to cc: Yuchung on the next rev.
> > TFO_DATA_NOT_ACKED is already available from the inverse of TCPI_OPT_SYN_DATA
> > #define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */
> >
> > It occurs (3)(4) are already available indirectly from
> > TCPI_OPT_SYN_DATA and tcpi_total_retrans together, but the socket must
> > query tcpi_total_retrans right after connect/sendto returns which may
> > not be preferred.
> >
> > How about an alternative proposal to the types to catch more TFO issues:
> >
> > TFO_STATUS_UNSPEC
> > TFO_DISABLED_BLACKHOLE_DETECTED
> > TFO_COOKIE_UNAVAILABLE
> > TFO_SYN_RETRANSMITTED // use in conjunction w/ TCPI_OPT_SYN_DATA for (3)(4)
>
> Ok, that set works for me. I will re-spin with these states for v2.
> Thanks for the suggestion!
Actually, longterm I hope we would be able to get rid of the
blackhole-detection and fallback heuristics. In a far distant future where
these middleboxes have been weeded out ;-)
So, do we really want to eternalize this as part of the API in tcp_info ?
Christoph
Powered by blists - more mailing lists