netdev - Re: [PATCH net-next 1/3] net/tcp_fastopen: Disable active side TFO in certain scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQyk6MEFpKg_qzLySYJcXT_zsb-bGYaBWUiOX=aEJkCafJQ@mail.gmail.com>
Date:   Thu, 20 Apr 2017 22:40:32 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Wei Wang <weiwan@...gle.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next 1/3] net/tcp_fastopen: Disable active side TFO in
 certain scenarios

On Thu, Apr 20, 2017 at 5:45 PM, Wei Wang <weiwan@...gle.com> wrote:
> From: Wei Wang <weiwan@...gle.com>
>
> Middlebox firewall issues can potentially cause server's data being
> blackholed after a successful 3WHS using TFO. Following are the related
> reports from Apple:
> https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
> Slide 31 identifies an issue where the client ACK to the server's data
> sent during a TFO'd handshake is dropped.
> C ---> syn-data ---> S
> C <--- syn/ack ----- S
> C (accept & write)
> C <---- data ------- S
> C ----- ACK -> X     S
>                 [retry and timeout]
>
> https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
> Slide 5 shows a similar situation that the server's data gets dropped
> after 3WHS.
> C ---- syn-data ---> S
> C <--- syn/ack ----- S
> C ---- ack --------> S
> S (accept & write)
> C?  X <- data ------ S
>                 [retry and timeout]
>
> This is the worst failure b/c the client can not detect such behavior to
> mitigate the situation (such as disabling TFO). Failing to proceed, the
> application (e.g., SSL library) may simply timeout and retry with TFO
> again, and the process repeats indefinitely.
>
> The proposed solution is to disable active TFO globally under the
> following circumstances:
> 1. client side TFO socket detects out of order FIN
> 2. client side TFO socket receives out of order RST
>
> We disable active side TFO globally for 1hr at first. Then if it
> happens again, we disable it for 2h, then 4h, 8h, ...
> And we reset the timeout to 1hr if a client side TFO sockets not opened
> on loopback has successfully received data segs from server.
> And we examine this condition during close().
>
> The rational behind it is that when such firewall issue happens,
> application running on the client should eventually close the socket as
> it is not able to get the data it is expecting. Or application running
> on the server should close the socket as it is not able to receive any
> response from client.
> In both cases, out of order FIN or RST will get received on the client
> given that the firewall will not block them as no data are in those
> frames.
> And we want to disable active TFO globally as it helps if the middle box
> is very close to the client and most of the connections are likely to
> fail.
>
> Also, add a debug sysctl:
>   tcp_fastopen_blackhole_detect_timeout_sec:
>     the initial timeout to use when firewall blackhole issue happens.
>     This can be set and read.
>     When setting it to 0, it means to disable the active disable logic.
>
> Signed-off-by: Wei Wang <weiwan@...gle.com>

Acked-by: Neal Cardwell <ncardwell@...gle.com>

neal