netdev - Re: TCP fast retransmit issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 26 Jul 2017 14:38:18 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Willy Tarreau <w@....eu>
Cc:     Eric Dumazet <eric.dumazet@...il.com>, Klavs Klavsen <kl@...n.dk>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>,
        Nandita Dukkipati <nanditad@...gle.com>
Subject: Re: TCP fast retransmit issues

On Wed, Jul 26, 2017 at 1:06 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
> On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
>> (2) It looks like there is a bug in the sender code where it seems to
>> be repeatedly using a TLP timer firing 211ms after every ACK is
>> received to transmit another TLP probe (a new packet in this case).
>> Somehow these weird invalid SACKs seem to be triggering a code path
>> that makes us think we can send another TLP, when we probably should
>> be firing an RTO. That's my interpretation, anyway. I will try to
>> reproduce this with packetdrill.
>
> Hmm. It looks like this might be a general issue, where any time we
> get an ACK that doesn't ACK/SACK anything new (whether because it's
> incoming data in a bi-directional flow, or a middlebox breaking the
> SACKs), then we schedule a TLP timer further out in time. Probably we
> should only push the TLP timer out if something is cumulatively ACKed.
>
> But that's not a trivial thing to do, because by the time we are
> deciding whether to schedule another TLP, we have already canceled the
> previous TLP and reinstalled an RTO. Hmm.

Yeah, it looks like I can reproduce this issue with (1) bad sacks
causing repeated TLPs, and (2) TLPs timers being pushed out to later
times due to incoming data. Scripts are attached.

neal

Download attachment "tlp-bad-sacks.pkt" of type "application/octet-stream" (1665 bytes)

Download attachment "tlp-bidirectional.pkt" of type "application/octet-stream" (1272 bytes)