lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADVnQy=jVU2fZ6XiLdNw_4Y=x9mEPdZbDxFYET=tRuNzCsHVOQ@mail.gmail.com>
Date:   Wed, 21 Feb 2018 11:57:35 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Teodor Milkov <tm@....bg>
Cc:     Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] tcp: restrict F-RTO to work-around broken middle-boxes

On Wed, Feb 21, 2018 at 7:38 AM, Teodor Milkov <tm@....bg> wrote:
> Here they are:
>
>
>  http://vps3.avodz.org/tmp/frto-4.14.11-linux.pcap.xz - 3.3 MB/s
>
>  http://vps3.avodz.org/tmp/frto-4.14.11-windows.pcap.xz - connection
> completely froze and eventually timed out
>
>  http://vps3.avodz.org/tmp/frto-4.14.20+revert-windows.pcap.xz - 5+ MB/s,
> which almost saturated the link
>

Thanks for the detailed traces! This is hugely helpful, and nails down
what is happening here.

As the first screen shot shows, an excerpt from your
frto-4.14.11-windows.pcap.xz trace (windows receiver suffers stall),
there is a painful interaction between:

(a) A very broken middlebox in the path of this traffic that is
stripping *all* SACK options, so that receivers advertise SACK
capability but are thereafter unable to communicate to the sender all
the packets they have received.

(b) The F-RTO change you mention above: 89fe18e44 ("tcp: extend F-RTO
to catch more spurious timeouts"), which causes more undo operations,
which are implicitly optimized for the case where more packets will be
SACKed, which does not happen because of (a), so that there are
repeated RTO timeouts.

(c) A Windows receiver that does not implement TCP timestamps. This
means that, per the TCP standard, the sender is supposed to keep
exponentially backing off for each of these RTOs.

The combination of these 3 factors causes very long stalls.

But please note that even with this F-RTO patch fully reverted, the
middlebox that drops SACKs is causing horrendous and unnecessarily
slow recoveries (see the second screen shot, from your
frto-4.14.20+revert-windows.pcap.xz trace). It would be nice to report
this SACK-stripping issue to the middlebox vendor, if possible. Or
maybe there is a config option that can disable this feature.

It seems we will indeed need to revert 89fe18e44. We have a Google TCP
team member out of the office on vacation. When he's back we'll
consult and follow up with our consensus.

Thanks again for the report and the traces! This was hugely helpful.

neal

Download attachment "windows-zoom0.png" of type "image/png" (44194 bytes)

Download attachment "windows-full.png" of type "image/png" (53564 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ