netdev - Re: TCP fast retransmit issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 26 Jul 2017 13:06:40 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Willy Tarreau <w@....eu>
Cc:     Eric Dumazet <eric.dumazet@...il.com>, Klavs Klavsen <kl@...n.dk>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>,
        Nandita Dukkipati <nanditad@...gle.com>
Subject: Re: TCP fast retransmit issues

On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
> (1) Because the connection negotiated SACK, the Linux TCP sender does
> not get to its tcp_add_reno_sack() code to count dupacks and enter
> fast recovery on the 3rd dupack. The sender keeps waiting for specific
> packets to be SACKed that would signal that something has probably
> been lost. We could probably mitigate this by having the sender turn
> off SACK once it sees SACKed ranges that are completely invalid (way
> out of window). Then it should use the old non-SACK "Recovery on 3rd
> dupack" path.
>
> (2) It looks like there is a bug in the sender code where it seems to
> be repeatedly using a TLP timer firing 211ms after every ACK is
> received to transmit another TLP probe (a new packet in this case).
> Somehow these weird invalid SACKs seem to be triggering a code path
> that makes us think we can send another TLP, when we probably should
> be firing an RTO. That's my interpretation, anyway. I will try to
> reproduce this with packetdrill.

Hmm. It looks like this might be a general issue, where any time we
get an ACK that doesn't ACK/SACK anything new (whether because it's
incoming data in a bi-directional flow, or a middlebox breaking the
SACKs), then we schedule a TLP timer further out in time. Probably we
should only push the TLP timer out if something is cumulatively ACKed.

But that's not a trivial thing to do, because by the time we are
deciding whether to schedule another TLP, we have already canceled the
previous TLP and reinstalled an RTO. Hmm.

neal