lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQynR8dUuUBDfeFR2xpL-wKzd+7TgbMv-CXe3TT-T0_PXyQ@mail.gmail.com>
Date:   Wed, 26 Jul 2017 14:38:18 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Willy Tarreau <w@....eu>
Cc:     Eric Dumazet <eric.dumazet@...il.com>, Klavs Klavsen <kl@...n.dk>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>,
        Nandita Dukkipati <nanditad@...gle.com>
Subject: Re: TCP fast retransmit issues

On Wed, Jul 26, 2017 at 1:06 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
> On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
>> (2) It looks like there is a bug in the sender code where it seems to
>> be repeatedly using a TLP timer firing 211ms after every ACK is
>> received to transmit another TLP probe (a new packet in this case).
>> Somehow these weird invalid SACKs seem to be triggering a code path
>> that makes us think we can send another TLP, when we probably should
>> be firing an RTO. That's my interpretation, anyway. I will try to
>> reproduce this with packetdrill.
>
> Hmm. It looks like this might be a general issue, where any time we
> get an ACK that doesn't ACK/SACK anything new (whether because it's
> incoming data in a bi-directional flow, or a middlebox breaking the
> SACKs), then we schedule a TLP timer further out in time. Probably we
> should only push the TLP timer out if something is cumulatively ACKed.
>
> But that's not a trivial thing to do, because by the time we are
> deciding whether to schedule another TLP, we have already canceled the
> previous TLP and reinstalled an RTO. Hmm.

Yeah, it looks like I can reproduce this issue with (1) bad sacks
causing repeated TLPs, and (2) TLPs timers being pushed out to later
times due to incoming data. Scripts are attached.

neal

Download attachment "tlp-bad-sacks.pkt" of type "application/octet-stream" (1665 bytes)

Download attachment "tlp-bidirectional.pkt" of type "application/octet-stream" (1272 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ